Google DeepMind and its data-science arm Kaggle have launched Kaggle Game Arena, an open-source platform designed to benchmark artificial-intelligence systems through competitive play. The inaugural event—a live, single-elimination chess tournament—begins on 5 August at 10:30 a.m. Pacific Time and will be streamed on YouTube and Kaggle. Eight large language models will take part: Google’s Gemini 2.5 Pro and Gemini 2.5 Flash, OpenAI’s o3 and o4-mini, Anthropic’s Claude 4 Opus, DeepSeek-R1, Moonshot AI’s Kimi K2 Instruct and xAI’s Grok 4. Matches are contested over a best-of-four series, with results feeding into a continually updated Bayesian skill-rating leaderboard. Chess grandmaster Hikaru Nakamura and streamer Levy Rozman are slated to provide commentary. Google says the Game Arena will expand to additional titles such as Go and poker and will run regular competitions to complement traditional static benchmarks, which the company argues no longer distinguish top-tier systems as clearly as they once did. The tournament features xAI’s Grok 4 just as the Elon Musk-backed startup rolls out several upgrades. This week xAI released Grok Imagine, a tool that converts text prompts into six-second videos, and published version 1.1.35 of its iOS app, promising Android support "soon." Access to Imagine is limited to Premium+ and SuperGrok subscribers, who pay roughly €35 a month. xAI also reported that Grok 4 scored 16.0 % on the ARC-AGI-2 benchmark and 25.4 % on Humanity’s Last Exam, edging out some rival models. Together, Google’s open competition and xAI’s rapid feature releases highlight the sector’s growing emphasis on transparent evaluation and differentiated capabilities as generative-AI vendors vie for users and credibility.
Google DeepMind dropped Olympics, but for AI - basically letting the AI play video games (Go, poker, real-time video games) - spectators can see each AI’s “reasoning traces” xAI Grok 4 OpenAI o3, o4-mini Anthropic Claude 4 Opus DeepSeek R1 and Kimi k2 (Moonshot AI) https://t.co/cwXIeqcarT
Google DeepMind dropped Olympics, but for AI - basically letting the AI play video games (Go, poker, real-time video games) - spectators (you) see each AI’s “reasoning traces” xAI Grok 4 OpenAI o3, o4-mini Anthropic Claude 4 Opus DeepMind Gemini 2.5 Pro & Gemini 2.5 Flash https://t.co/j2phSem2w4 https://t.co/cwXIeqcarT
🏆 Google just released an open source game arena (RL environments), a new leaderboard testing how modern LLMs perform on game. Its a head-to-head gaming league where frontier models fight at chess and future titles, giving a clear scoreboard for general intelligence. The https://t.co/jF03I3Fjrt