Jun 17, 03:36 AM

Groq Joins Hugging Face, Promising Real-Time LLM Inference for Llama 4 and Qwen 3 at $0.29/M Tokens

Groq Inc. has been added as an official inference provider on Hugging Face’s Playground and API, giving developers direct access to the chipmaker’s custom Language Processing Units for ultra-low-latency large-language-model workloads. The integration supports state-of-the-art models including Meta’s Llama 4 and Alibaba’s Qwen3-32B. Groq says it is the only provider able to deliver Qwen3-32B’s full 131,000-token context window at real-time speeds. Usage is priced at $0.29 per million input tokens and $0.59 per million output tokens, undercutting traditional cloud offerings and positioning the partners to compete more aggressively with incumbents such as AWS and Google. The service is available immediately through Hugging Face’s Inference Providers marketplace, enabling developers to build agents, copilots and other latency-sensitive applications without additional integration work.

#API #Language Processing Units #Meta #Alibaba #Groq #AWS #Google

Written with ChatGPT .