DeepNewz, mobile.

People-sourced. AI-powered. Unbiased News.

Screenshot of DeepNewz app showing story detail view.

Screenshot of DeepNewz app showing story list view.

Search

For You

Sources

Loading...

Additional media

Image #1 for story cerebras-achieves-969-tokens-second-meta-s-llama-3-1-405b-12x-faster-than-gpt-4o-1e98b318

Image #2 for story cerebras-achieves-969-tokens-second-meta-s-llama-3-1-405b-12x-faster-than-gpt-4o-1e98b318

Image #3 for story cerebras-achieves-969-tokens-second-meta-s-llama-3-1-405b-12x-faster-than-gpt-4o-1e98b318

Image #4 for story cerebras-achieves-969-tokens-second-meta-s-llama-3-1-405b-12x-faster-than-gpt-4o-1e98b318

Image #5 for story cerebras-achieves-969-tokens-second-meta-s-llama-3-1-405b-12x-faster-than-gpt-4o-1e98b318

Image #6 for story cerebras-achieves-969-tokens-second-meta-s-llama-3-1-405b-12x-faster-than-gpt-4o-1e98b318

Image #7 for story cerebras-achieves-969-tokens-second-meta-s-llama-3-1-405b-12x-faster-than-gpt-4o-1e98b318

Similar Stories

Footer

AI

AI Fundraising
AI Modeling
AI Products

Business

Automobile
Company Earnings
Economics
Law
Prediction Markets
Real Estate
Rumors
Stocks
VC

Crypto

Airdrops
Blockchains
CBDCs
DeFi
Hacks
Markets
Memecoin
Mining
NFT
Regulation

Culture

Anime
Celebrities
Crime
Education
Movies
Music
Obituary
TV
Video Games

Environment

Climate
Energy
Natural Disasters
Natural Resources
Sustainability

Politics

DOGE
Epstein Files
Executive Orders
Tariffs
US Domestic Policy
US Elections
US Foreign Policy
US Gov Appointments
US Judiciary
US Legislation
War

Science

Archeology
Bio
Health
Math
Pharma
Physics
Space

Sports

Boxing
Chess
Cricket
Golf
Hockey
MLB
NBA
NCAA
NFL
Olympics
PGA
Poker
Racing
Rugby
Soccer
Tennis
UFC

Tech

AR-VR
Fintech
Infosec
IoT
Metaverse
Policy
Robotics
Smart Home
Software
Startups
Wearables

United States

Arizona
Boston
California
Chicago
Colorado
Detroit
Florida
Georgia
Las Vegas
Los Angeles
New Jersey
New Mexico
New York
Ohio
Oregon
Philadelphia
San Francisco
Seattle
Texas
Utah
Washington DC

World

Terms of Service

WhatsApp YouTube X

© 2025 DeepNFTValue, Inc. All rights reserved.

Nov 19, 11:07 AM

Cerebras Achieves 969 Tokens/Second with Meta's Llama 3.1 405B, 12x Faster than GPT-4o, 240ms Time-to-First Token

Cerebras Achieves 969 Tokens/Second with Meta's Llama 3.1 405B, 12x Faster than GPT-4o, 240ms Time-to-First Token

Authors

18

Cerebras Systems has achieved a significant milestone in AI performance with the deployment of Meta's Llama 3.1 405B model, which operates at an output speed of 969 tokens per second. This performance is reported to be 12 times faster than OpenAI's GPT-4o and 18 times faster than Anthropic's Claude 3.5 Sonnet. The system boasts an impressive time-to-first token of just 240 milliseconds and supports a context length of 128,000 tokens with 16-bit weights. Cerebras is also preparing to launch a public inference endpoint, expanding access to this advanced AI capability. The rapid performance improvements have positioned Cerebras as a leader in the AI inference market, outpacing competitors such as AWS and Nvidia. Recent benchmarks indicate that Cerebras's Llama 3.1 405B model runs nearly twice as fast as the fastest GPU cloud implementation of a significantly smaller model, showcasing the advancements in their Wafer Scale Engine technology.

#Cerebras Systems #Meta #Llama #OpenAI #Anthropic #Cerebras #AWS #Nvidia #Wafer Scale Engine

Written with ChatGPT (GPT-4o mini).

Gao Dalie (高達烈)@GaoDalie_AI
9 months ago
Groq reduces the inference speed of Llama 3 70B to 3200 tokens per second. Three months ago, Llama 8B was 750 tokens per second. The improvement is so rapid that the next generation of hardware will be released soon. ---- I share my learning journey here, join me and let's… https://t.co/x384Cl7PTR
scott belsky@scottbelsky
9 months ago
“To put it into perspective, Cerebras ran the 405B model nearly twice as fast as the fastest GPU cloud ran the 1B model. Twice the speed on a model that is two orders of magnitude more complex.” Bonkers-level inference performance🤯 https://t.co/rJulxjEw72
Wes Roth@WesRothMoney
9 months ago
969 tok/sec from @CerebrasSystems. Very impressive!🎉 https://t.co/dWhPRMYJuQ

AI /ChatGPT Features

Similar Stories

NVIDIA’s Llama Nemotron Super 49B v1.5 Scores 64, Tops AI Index, Commercially Available with 26M Lines Training Data

Authors

6

19 days ago

OpenAI Boosts GPT-5 Speed 2x in Cursor, Outperforms Sonnet-4 on Gemini API Task

Authors

8

5 days ago

Anthropic’s Claude Opus 4.1 Tops Coding Tests as OpenAI Pushes Open Models

Authors

24

12 days ago

Meta Targets World’s Largest AI Cluster With 500,000 GPUs by 2026

Authors

13

17 days ago

OpenAI Halves GPT-5 Latency in Cursor and Cuts Cached-Token Fees

Authors

13

5 days ago

Zhipu AI Launches GLM-4.5 and GLM-4.5 Air Open-Source Models With MoE Architecture, MIT License, and Competitive API Pricing

Authors

32

19 days ago

Anthropic’s Claude Sonnet 4 Expands Context Window Fivefold to 1 Million Tokens, Supports 75,000 Lines of Code with New Pricing on API and Amazon Bedrock

Authors

24

5 days ago

Alibaba Launches Non-Reasoning Qwen3-235B-A22B Model on Hugging Face With MoE Architecture; Boson AI Debuts Higgs Audio V2 TTS

Authors

7

26 days ago

Leaked Tests Suggest Google’s Gemini 3.0 Beats GPT-5

Authors

22

6 days ago

OpenAI Plans to Double Compute Capacity Amid GPT-5 Surge

Authors

12

6 days ago