OpenAI's GPT-OSS models, including the 20 billion and 120 billion parameter versions, are now running on multiple advanced hardware platforms, achieving record inference speeds. On Groq's infrastructure, GPT-OSS-20B processes 1,200 tokens per second, while GPT-OSS-120B reaches 536 to 540 tokens per second, with integrated code execution and web search capabilities. Pricing for these models on Groq is set at $0.10 to $0.50 for the 20B model and $0.15 to $0.75 for the 120B model. NVIDIA has accelerated these open weight models on its Blackwell architecture, delivering up to 1.5 million tokens per second on an NVIDIA GB200 NVL72 system. Cerebras is also powering OpenAI's GPT-OSS models, providing speeds of approximately 3,000 tokens per second and supporting frontier reasoning capabilities. These developments highlight the collaboration between OpenAI and hardware providers Groq, NVIDIA, and Cerebras to optimize performance and accessibility of large-scale open-source AI models.
The new leader GTP-OSS-120B, located in the top left, available on @GroqInc https://t.co/Pjd7oa9Bnm
Proud to be powering search for gpt-oss on Groq 🚀 https://t.co/879aJ7VnHX
Cerebras delivers blazing speed for OpenAI’s new open-model with 3,000 tokens/s https://t.co/oVzWkWektK