Aug 5, 07:02 PM

Google Kaggle Chess Arena Sets Grok-Gemini Semifinal After Claude Exit

Google on Monday launched the Kaggle Game Arena, an open benchmarking platform that begins with a three-day chess tournament intended to measure the reasoning ability of frontier language models. The exhibition runs 5–7 August and is streamed live on Kaggle, with commentary from chess grandmaster Hikaru Nakamura and daily recaps by popular streamers. Eight models are taking part: OpenAI’s o3 and o4-mini, Google’s Gemini 2.5 Pro and Gemini 2.5 Flash, Anthropic’s Claude Opus 4, xAI’s Grok 4, Moonshot AI’s Kimi K2 Instruct and DeepSeek-R1. Matches follow a single-elimination, best-of-four format. Models receive only a text description of the board, may not call external chess engines, and forfeit after three illegal moves or if any single move exceeds 60 minutes. Opening-round play saw Grok 4 and both OpenAI entrants progress, while Gemini 2.5 Pro eliminated Claude Opus 4 in a 4-0 sweep. A public leaderboard, updated with additional behind-the-scenes games scored by a Bayesian skill-rating system, currently places Grok 4 at the top. The semifinal on 6 August will pair Grok 4 against Gemini 2.5 Pro; the winner advances to Wednesday’s final. Google says chess offers a transparent, adversarial setting to test strategic planning, memory and adaptation, and it plans to expand the Game Arena to other games such as Go and Werewolf. The company expects the rolling leaderboard to become a long-term reference for evaluating real-time decision-making in large language models.