Aug 5, 06:19 PM

OpenAI’s GPT-OSS 20B and 120B Models Reach Up to 1.5M Tokens/Sec on Groq, NVIDIA, and Cerebras with New Pricing

OpenAI's GPT-OSS models, including the 20 billion and 120 billion parameter versions, are now running on multiple advanced hardware platforms, achieving record inference speeds. On Groq's infrastructure, GPT-OSS-20B processes 1,200 tokens per second, while GPT-OSS-120B reaches 536 to 540 tokens per second, with integrated code execution and web search capabilities. Pricing for these models on Groq is set at $0.10 to $0.50 for the 20B model and $0.15 to $0.75 for the 120B model. NVIDIA has accelerated these open weight models on its Blackwell architecture, delivering up to 1.5 million tokens per second on an NVIDIA GB200 NVL72 system. Cerebras is also powering OpenAI's GPT-OSS models, providing speeds of approximately 3,000 tokens per second and supporting frontier reasoning capabilities. These developments highlight the collaboration between OpenAI and hardware providers Groq, NVIDIA, and Cerebras to optimize performance and accessibility of large-scale open-source AI models.

#OpenAI #Groq #NVIDIA #Blackwell #Cerebras

Written with ChatGPT (GPT-4).