
CoreWeave Inc., a cloud computing provider, has set a new industry benchmark in AI inference with NVIDIA GB200 Grace Blackwell Superchips, achieving 800 tokens per second on the Llama 3.1 405B model. This performance was achieved using a CoreWeave instance equipped with two NVIDIA Grace CPUs and four NVIDIA Blackwell GPUs. Peter Salanki, Chief Technology Officer at CoreWeave, emphasized the company's commitment to delivering cutting-edge infrastructure for large-model inference through its purpose-built cloud platform, positioning it as a preferred provider for leading AI labs and enterprises. The company also reported a significant 40% improvement in throughput with NVIDIA H200 GPU instances, reaching 33,000 tokens per second on the Llama 2 70B model compared to NVIDIA H100 instances. CoreWeave became the first to offer general availability of NVIDIA GB200 NVL72-based instances this year, following its early adoption of NVIDIA H100 and H200 GPUs last year. In addition to its AI achievements, CoreWeave's stock has seen a notable surge following its initial public offering (IPO). After a tepid debut, the stock has risen sharply, with shares increasing by 65% over two days, adding over $7 billion to its market value. This rebound has been attributed to strong investor interest in AI technologies and CoreWeave's strategic partnerships with NVIDIA and OpenAI. The NVIDIA Blackwell platform, featuring the GB200 NVL72 system, has also set new records in the latest MLPerf Inference V5.0 benchmarks, delivering up to 30x higher throughput on the Llama 3.1 405B benchmark compared to the NVIDIA H200 NVL8 system. This performance underscores NVIDIA's leadership in AI computing infrastructure, supported by its NVIDIA Hopper architecture and the concept of AI factories, which aim to deliver accurate answers quickly and at low cost. The NVIDIA DGX B200 system with eight Blackwell GPUs tripled performance over using eight NVIDIA H200 GPUs on the Llama 2 70B Interactive benchmark, which has a 5x shorter time per output token (TPOT) and 4.4x lower time to first token (TTFT). This MLPerf round saw 15 partners, including ASUS, Cisco, Dell Technologies, Fujitsu, Giga Computing, Google Cloud, Hewlett Packard Enterprise, Lambda, Lenovo, Oracle Cloud Infrastructure, Quanta Cloud Technology, Supermicro, Sustainable Metal Cloud, and VMware, submitting results on the NVIDIA platform. The work of MLCommons to evolve the MLPerf Inference benchmark suite and provide rigorous, peer-reviewed performance data is vital for IT decision makers selecting optimal AI infrastructure. Images and video from this round were taken at an Equinix data center in Silicon Valley.






Guess those concerns about the @CoreWeave $CRWV IPO dissipated in record time? 🤷♂️ Shows the problem with our media consumption. The first day story is everywhere... but when new data comes in and reverses the narrative, there's nothing. https://t.co/ph8UEUzlUP
Mistral AI recently released Mistral Small 3.1! 🚀📢 They claim it outperforms comparable models like Gemma 3 and GPT-4o Mini while delivering inference speeds of 150 tokens per second. Running the model on our benchmarks now, excited to see how it stacks up, @MistralAI! 🏅📊 https://t.co/pGOTFlmMGX
Wow. Amd for video gen with @higgsfield_ai was 35% cheaper and 20% faster than Nvidia https://t.co/bKyeCgDRGI