Jul 18, 10:37 PM

xAI's Grok 4 Scores 60.5% on SimpleBench, Ranks Second Behind Gemini 2.5 Pro as OpenAI and Anthropic Launch New AI Tools

Recent developments in the AI sector highlight competitive advancements among leading large language models (LLMs). OpenAI introduced ChatGPT Agent, enhancing the ChatGPT platform with fortified security measures. Moonshot released the open-source Kimi K2 model, while Anthropic launched both the Claude Tool Directory for app integrations and the Claude model itself. In benchmark testing on SimpleBench, a 200-plus question private dataset designed to prevent memorization, xAI's Grok 4 secured second place with a 60.5% score, trailing only behind Gemini 2.5 Pro, which remains the top performer. Grok 4 outperformed Anthropic's Claude and other models such as o3. The LLM competition is intensifying, with open-source models like Kimi, DeepSeek, and Qwen also gaining attention alongside proprietary offerings from OpenAI, Anthropic, and MetaAI.