Openai says that by harvesting a large amount of data from AI technology, it is considering evidence that Chinese emerging companies Deepseek have broken the service conditions.
San Francisco, currently rated $ 157 billion, has stated that DeepSeek uses data generated by Openai Technologies to teach similar skills to its own system.
This process, called distillation, is common in the entire AI field. However, according to Openai's Terms of Use, the company states that everyone has not permitted data generated by the system to build a competing technology in the same market.
“I know that the PRC group actively uses a method of duplicating advanced USAI models, including what is called distillation,” said Openai's Spokeswoman Liz Bourgeois. He mentioned in a statement that sent e -mail to the People's Republic of China. 。
“We are reviewing the signs that Deepseek may have inappropriately distilled the model and could share information as we know,” she reviews. I did it. “We are taking aggressive and aggressive measures to protect the technology, and will continue to work closely with the US government to protect the most capable models built here.”
DeepSeek did not respond immediately to the comment request.
DeepSeek surprised Silicon Valley high -tech companies, sending US financial markets to tail spins earlier this week, releasing AI technology that matches other things in the market.
The general wisdom was that the most powerful system could not be built without billions of dollars in a specialized computer chip, but DeepSeek stated that he had created technology with much less resources.
Like other AI companies, DeepSeek has built technology using computer code and data surrounded by the Internet. AI companies are greatly leaning toward open sourcing practis, freely sharing the code that supports technology, and reusing the code shared by others. They believe this is a way to accelerate technology development.
Training an AI system requires a huge amount of online data. These systems learn skills by identifying text, computer programs, images, sound, and video patterns. The main system learns skills by analyzing almost all text on the Internet.
Distillation is often used to train new systems. If a company gets data from its own technology, practice may be legally a problem. But it is often allowed by open source technology.
Openai is currently facing more than 12 lawsuits that have blamed it illegally blamed to train the system using copyrighted Internet data. This includes the New York Times litigation against Openai and its partner Microsoft.
The lawsuit claims that millions of articles published by The Times were used to train automated chatbots competing with news outlets as a trusted source. Both Openai and Microsoft have denied claims.
The Times report also shows that Openai has generated an audio from YouTube video using audio recognition technology to create a new conversational textbook that makes the AI system smarter. Some Openai employees have a knowledge of conversation that such movements have discussed how they are against YouTube rules.
People, including the company's president, Greg Blockman, said that they had transcribed a YouTube video for more than 1 million hours. Next, the text was supplied to a system called GPT-4. This is widely thought to be one of the most powerful AI models in the world, and is the basis of the latest version of Chatgpt chatbot.