DeepSeek, a Chinese startup known for its innovative approach to artificial intelligence, has dramatically increased AI computing requirements by 100 to 1,000 times compared to Nvidia's standards. The company recently revealed that it uses 2,048 Nvidia chips to train its V3 model, employing a hardware-software co-design strategy as it prepares for its upcoming V4 and R2 models. This development follows U.S. President Donald Trump's announcement of a $500 billion public-private investment in America's AI future. Meanwhile, Nvidia has advanced its own technology with the release of the B200, a desktop device equipped with four RTX 4090 GPUs capable of supporting up to 70 billion parameter unquantized models. Independent benchmarking shows that DeepSeek's R1 model achieves over 1,000 output tokens per second, which is more than ten times faster than some competitors, marking it as the fastest R1 endpoint tested so far. Additionally, Shanghai Goku Technologies, a Chinese quantitative fund, has published a paper on an AI training breakthrough, signaling increased activity and innovation in China's AI sector.
Nvidia has broken through prior barriers with their B200 We have conducted independent benchmarking and are seeing >1,000 output tokens/s on DeepSeek R1, >10X the speed of some other providers. This represents the fastest R1 endpoint that we have benchmarked yet. Exciting times https://t.co/SsgceyLQNV
Report: Another DeepSeek? Chinese quant fund publishes paper on AI training breakthrough Shanghai Goku Technologies, established in 2015, submitted the paper to the Conference on Neural Information Processing Systems – an annual gathering of top scientists in machine learning https://t.co/BEdKnmQ2pJ
DeepSeek offers new details on using 2,048 Nvidia chips to train its V3 model In a paper co-authored by founder Liang Wenfeng, the start-up attributes its success to a hardware-software co-design approach...important as firm readies V4 and R2... https://t.co/NL4AAqctjM