Last month, US financial markets were one of the world's most powerful artificial intelligence systems, using far fewer computer chips than many experts thought could. After stating that he had built it, he fell.
AI companies typically train chatbots using supercomputers packed with over 16,000 specialist chips. However, Deepseek said it only needs around 2,000.
Deepseek engineers have used some technical tricks to significantly reduce the cost of building the system, as detailed in a research paper published shortly after Christmas. The engineer only needed about $6 million to raw computing power, about a tenth of what Meta spent building the latest AI technology.
What exactly did Deepseek do? This is the guide.
How is AI technology built?
The main AI technologies are based on what scientists call neural networks, mathematical systems that learn skills by analyzing huge amounts of data.
The most powerful system analyzes almost every English text on the Internet, as well as many images, sounds and other multimedia for several months. This requires a huge amount of computing power.
About 15 years ago, AI researchers realized that specialized computer chips called graphics processing units (GPUs) are an effective way to perform this type of data analysis. Companies like Silicon Valley Chipmaker Nvidia originally designed these chips to render graphics for computer video games. However, GPUs also had some tips for running powered neural networks.
As businesses pack more GPUs into computer data centers, AI systems can analyze more data.
However, the best GPU costs around $40,000 and requires a huge amount of electricity. Sending data between chips allows you to use more power than you would run the chip yourself.
How did DeepSeek reduce costs?
It did a lot. Most notably, it embraced a method called “combination of experts.”
Companies have typically created a single neural network that learns all patterns of all data on the Internet. This was expensive as it required a huge amount of data to travel between GPU chips.
If one chip was learning how to write poetry and another was learning how to write a computer program, they had to talk to each other in case there was a overlap between the poetry and programming.
With a mix of expert methods, researchers have attempted to solve this problem by dividing the system into many neural networks. One is for poetry, computer programming, biology, physics, etc. There may be 100 of these small “expert” systems. Each expert can focus on a specific field.
Many companies struggle with this approach, but Deepseek has managed to do it well. The trick was to combine these small “expert” systems with “generalist” systems.
Experts still need to exchange some information with each other, and generalists may help coordinate interactions between experts, although they do not understand each subject properly.
This is like an editor who oversees a newsroom full of specialized reporters.
And is it more efficient?
more. But that's not all that Deepseek did. I also learned simple tricks that anyone who remembers elementary math classes can understand.
Does this involve mathematics?
Remember a mathematics teacher explaining the concept of Pi. PI, also referred to as π, is an endless number: 3.14159265358979…
Useful calculations can be made, such as using π to determine the circumference of a circle. Doing these calculations reduces π to just a few dozen types: 3.14. Using this simple number gives you a rather good estimate of the circumference of a circle.
Deepseek did something similar to training in AI technology.
The mathematics that allow neural networks to identify patterns in text are actually just multiplication. It's a lot of proliferation. We're talking about months of multiplication on thousands of computer chips.
Typically, the chip is multiplied by a number that fits in 16-bit memory. However, Deepseek narrowed each number down to 8 bits of memory (half of the space). Essentially, it defeated several decimals from each number.
This meant that each calculation was less accurate. But that wasn't a problem. The calculations were accurate enough to generate a very powerful neural network.
that's it?
Well, they added another trick.
After narrowing each number down to 8-bit memory, DeepSeek took a different route when mixing those numbers. When determining the answer to each multiplication problem – doing important calculations that help determine how the neural network works, stretched to 32-bit memory. In other words, it kept more decimals. I made the answer more accurate.
So could the high school students do this?
Well, no. In their paper, the engineers at Deepseek showed that they are very good at writing extremely complex computer code that tells the GPU what to do. They knew how to narrow the efficiency even further from these chips.
Few people have that skill. But serious AI labs have the talented engineers they need to match what Deepseek did.
So why were they not doing this already?
Some AI labs may already be using some of the same tricks. Companies like Openai don't always reveal what they're doing behind closed doors.
However, others were clearly surprised by Deepseek's work. It's not easy to do what a startup has done. The experiments needed to find such a breakthrough include millions of dollars (if not billions) of electricity.
In other words, there is a huge amount of risk.
“You have to line up a lot of money to try something new, and often they fail,” said Tim Detmers, a researcher at the Allen Institute of Artificial Intelligence in Seattle. . Meta AI researcher.
“That's why we don't see much innovation. People are afraid of losing millions just to try things that don't work,” he added.
Many experts pointed out that Deepseek's $6 million only covers what the startups spent when training the final version of the system. In their paper, Deepseek engineers said they spent additional funds on research and experiments before running the final training. But the same can be said for cutting-edge AI projects.
Deepseek experimented and was rewarded. Nowadays, Chinese startups share their methods with other AI researchers, so their technical tricks are poised to significantly reduce the cost of building AI.