xAI’s flagship large language model, Grok 4 Heavy, has edged out OpenAI’s next-generation GPT-5 in a recently disclosed run of the independent “Humanity’s Last Exam” benchmark, according to data circulating among AI researchers on 7 Aug. Grok 4 Heavy recorded a 44.4% result, compared with GPT-5’s 42.0%, indicating a modest performance lead for the Musk-backed startup in one of the industry’s widely watched stress-tests of reasoning and general knowledge. The result is notable because GPT-5 represents OpenAI’s first major model upgrade since the GPT-4 series, and comes as the Microsoft-backed company strives to maintain its technological edge amid intensifying competition. xAI, founded in 2023, has been positioning Grok as a direct rival to OpenAI models while integrating the system across the X social-media platform and other services. While benchmark scores do not always translate into real-world application quality, the latest figures add to pressure on incumbents as a growing field of challengers demonstrates rapid gains in model capability. Neither company immediately commented on the benchmark comparison.
Judged by the AI benchmark test - Humanity's last exam: Grok 4 Heavy, 3 weeks ago, scored better than GPT 5 pro, scored today. https://t.co/9DuGLc3MkH
BREAKING: xAI's Grok 4 Heavy outperforms OpenAI GPT-5 on Humanity’s Last Exam benchmark. https://t.co/iR5JmX29nw
GPT-5 still performs worse that Grok-4 Heavy on Humanity's last exam. @elonmusk & @xai can't stop winning. 💪 https://t.co/nYypynagze