DeepSeek, a Chinese AI firm, has launched DeepSeek V3, an open-source large language model (LLM) with 671 billion parameters. This model reportedly surpasses competitors such as GPT-4o and Llama 3.1 405B on key benchmarks, particularly in coding and math tasks. DeepSeek V3 utilizes a mixture-of-experts architecture, activating only 37 billion parameters at a time, allowing it to operate at a fraction of the cost of its U.S. counterparts. In recent weeks, three Asian startups have introduced frontier models, all featuring open weights and permissive licenses, including DeepSeek V3, MiniMax-Text 01 with 456 billion parameters, and InternLM3-8B-Instruct. This development signals a notable advancement in China's AI capabilities, particularly as public data is now being leveraged for labeling, enhancing the potential for further innovation in the sector.
DeepSeek-V3 shows China's AI getting better — and cheaper https://t.co/Y9OR87dvDX
DeepSeek-V3, the company's latest open LLM, surpasses Llama 3.1 405B and GPT-4o on key benchmarks, especially in coding and math tasks. Using a mixture-of-experts architecture with 671 billion parameters, only 37 billion are active at once, DeepSeek V3 was trained at a low cost…
DeepSeek, a Chinese #AI firm and family of large language models, says it has found a way to match the performance of its U.S. rivals using second-tier graphics processing units—and at a fraction of the cost. https://t.co/TgW0kMpmbc