Jul 15, 01:37 AM

Groq Hosts Kimi K2 LLM at 185 Tokens a Second

Groq Inc has added the new open-source Kimi K2 large language model to its GroqCloud service, offering the system in public preview at a stated inference speed of 185 tokens per second. Early testers report bursts approaching 220 tokens a second, performance the company attributes to its custom inference chips. Initial benchmark results paint a competitive picture. Kimi K2 topped a creative story-writing test and scored 3.7 on the Multi-Agent Elimination Game benchmark, trailing Grok 4’s 5.9. The model posted an average rank of 1.94 on the Thematic Generalization assessment (lower is better) and a 20.4 on a confabulation-tracking benchmark. Beyond GroqCloud, the model is being made available free on Together Compute and other public evaluation platforms, broadening access for researchers and developers. The rapid rollout underscores mounting interest in high-performance open-source LLMs as alternatives to proprietary systems.

#Groq Inc #Kimi K2 #GroqCloud #Grok #Thematic Generalization #Together Compute

Written with ChatGPT .