DeepSeek-R1: A sudden global hype in the AI community
DeepSeek-R1 has sparked a global excitement in the AI community with unexpected speed, yet high-quality information about DeepSeek is relatively scarce. On January 26, 2025, Li Guangmi, founder and CEO of Shixiang, organized a closed discussion session on DeepSeek . Among the participants were dozens of leading AI researchers, investors and top-notch AI practitioners . They discussed technical details of DeepSeek, corporate culture, and the short, medium and long-term impacts after the project’s sudden breakthrough .
It is important to emphasize that this meeting was an unofficial technical discussion and does not reflect any particular individual or institutional viewpoints. Well-known Silicon Valley investor Marc Andreessen described DeepSeek-R1 as “as open source, a profound gift to the world.” Therefore, in keeping with the open source spirit of DeepSeek, the participants of this roundtable have also decided to make their collective reflections public.
Below is a summary of the key points of the meeting.
The Mystery of DeepSeek
“The most important thing for DeepSeek is to advance intelligence.”
- Founder and CEO Liang Wenfeng is the central figure of DeepSeek. He is not like Sam Altman – he is very technically savvy.
- DeepSeek has a good reputation for being the first to successfully replicate MoE and o1. Its success is based on starting early. Whether DeepSeek can deliver the best solution in the long term remains to be seen. The biggest challenge is limited resources, so the company needs to focus on the most promising areas. However, the research team and company culture are very strong. With an additional 100,000 to 200,000 GPUs, it could deliver even better results.
- DeepSeek has greatly improved its long-context capabilities in a short period of time. Even with conventional methods, it achieves long-context processing of 10K.
- Scale.ai ‘s CEO claimed that DeepSeek has 50,000 GPUs, but this is an exaggeration. Public information suggests that DeepSeek has about 10,000 legacy A100 GPUs and possibly 3,000 H800 GPUs (before the US embargo). DeepSeek is very conscious of compliance and has not purchased any unapproved GPUs, so the number is likely to be limited. In contrast, the US AI industry is more wasteful with GPU resources.
- DeepSeek focuses exclusively on a narrow area and deliberately avoids many other developments, such as multimodal models. It is not just about serving human needs, but primarily about developing intelligence itself. This focus could be a decisive factor for success.
- Quantification could in some ways be considered DeepSeek’s business model. Liang Wenfeng’s other company, Qifan (a quantitative investment firm), was a product of the last generation of machine learning. For DeepSeek, the most important goal is to advance intelligence – money and monetization are a low priority. China needs leading AI labs working on solutions that could surpass OpenAI. Developing intelligence is a long-term process, and this year the industry will continue to diversify – new technologies must emerge.
- From a technical perspective, DeepSeek acts as a “training ground” that trains and disseminates talent.
- In the US, too, the business models for AI labs are not sustainable. There is currently no really functioning business model for AI. In the future, this must change. Liang Wenfeng has ambitious goals – DeepSeek is not committed to a specific form, but is moving resolutely towards AGI.
- A look at DeepSeek’s research shows that many of the innovations are aimed at saving hardware costs. In several important scaling approaches, DeepSeek’s techniques help reduce costs.
- In the long term, this will not have a major impact on computing capacity, but in the short term, the focus will be on making AI more efficient. The need remains high as computing resources are scarce everywhere.
- DeepSeeks organization and corporate culture
- a) Investments usually select the best talent, but DeepSeek follows a different model. The team consists of bright, young graduates from Chinese universities. Their collaboration and development could produce long-term results just as strong as established elite teams. Poaching individual talent may therefore not necessarily be the decisive advantage.
- b) There is plenty of money in the market, but the key at DeepSeek is the organizational culture. The research culture is similar to that of ByteDance – it is essential and focused. A good corporate culture is characterized by financial stability and a long-term orientation. Companies with strong business models can afford such cultures – both DeepSeek and ByteDance meet these conditions.
- Why can DeepSeek catch up so quickly?
- a) Reasoning models require high-quality data and training. Replicating a closed-source model is particularly difficult when dealing with long texts or multimodal models. Pure reasoning models, on the other hand, have not undergone revolutionary architectural changes, which makes them easier to reproduce.
- b) DeepSeek-R1 was able to evolve quickly because the requirements were not extremely complex. Reinforcement Learning (RL) was used only to refine the model decisions. R1 did not exceed the efficiency of Consensus 32, but instead spent 32 times the computational power to implement an exploratory strategy sequentially. This did not push the intelligence frontier, but merely made it easier to implement.
Explorers vs. Catchers: “AI resembles a jump function – Catchers require 10 times less computing power”
- AI behaves like a leap function: the computing power requirements for catchers are ten times lower today. Catchers have historically lower computing costs, while explorers have to train a large number of models. However, research into new algorithms and architectures will never stop. Behind every technological leap function lies a huge investment of resources. Therefore, the need for computing power will continue to grow, as will investments in product-related applications. In addition to reasoning models , there are many other areas that require enormous computing capacity. Often the immense computing effort of the explorers remains invisible – but without these investments there would be no further technological leaps. In addition, there are numerous researchers who are dissatisfied with existing architectures and RL methods and are continuously driving new developments.
- More computing power does not necessarily lead to better results, but there is a minimum requirement. The difference between 10,000 and 1,000 GPUs may not be significant, but with only 100 GPUs it would most likely be impossible to develop a competitive model. The main reason for this is the long duration of an iteration – with too few resources, progress is greatly slowed down.
- Progress in physics is driven by two groups: academic researchers and industrial laboratories. The former focus on exploring different directions without worrying about immediate economic profitability. The latter focus on improving efficiency and practical applications.
- Different approaches for explorers and catch-ups:
- Small businesses with limited computing resources need to pay particular attention to efficiency.
- Large companies, on the other hand, optimize their strategy to develop models as quickly as possible.
- Methods that increase efficiency in clusters with 2,000 GPUs often do not work in environments with tens of thousands of GPUs – here stability counts more than optimization.
- The CUDA ecosystem advantages and China’s catch-up strategy:
- CUDA has an advantage because of its large and complete operator library.
- Chinese companies such as Huawei are relying on selective optimization of frequently used operators for their technological breakthrough – a late-entry advantage strategy .
- Given the high costs of driving innovation as a leading company, the more efficient model for many is to act as a catch-up company.
- What will be the next big catch-up target in China?
Multimodal AI could be a promising field, as GPT-5 from the West is still a long way off.

