Artificial-intelligence systems from OpenAI and Google DeepMind have for the first time matched the gold-medal standard at the International Mathematical Olympiad, a competition traditionally dominated by the world’s top high-school mathematicians. Both models solved five of the six problems set at the 66th edition of the contest, held this month on Australia’s Sunshine Coast, reaching the 35-point threshold required for a gold medal. OpenAI’s unreleased experimental reasoning large-language model produced natural-language proofs within the Olympiad’s two 4.5-hour sittings and without internet access or specialised tools. The company said three former IMO gold medallists independently marked the solutions and agreed on the 35 / 42 score. Because OpenAI did not submit the work through the competition’s official process, some organisers questioned the timing of its early public announcement. Google DeepMind followed the IMO’s official grading protocol. An advanced version of its Gemini model, dubbed “Deep Think”, also scored 35 points, with the Olympiad’s own coordinators certifying the result. DeepMind said the system relies on “parallel thinking” reinforcement-learning techniques that let it explore multiple solution paths at once—an approach that contrasts with last year’s silver-level entry, which depended on formal proof languages. Although the AIs reached gold-level performance, they did not top the human field. Twenty-six students scored higher than the models, and five competitors earned perfect 42-point scores. Still, researchers from both labs say the advance shows that general-purpose language models are approaching the level needed to assist in frontier mathematical and scientific research. DeepMind plans to offer a test version of Gemini Deep Think to paying users, while OpenAI says it will hold back its model for several months. The twin breakthroughs intensify the rivalry between the two leading AI labs and underscore the rapid pace of progress in machine reasoning. They also raise new questions about verification standards, compute requirements and how quickly such systems can be applied beyond benchmark competitions to open research problems.
Code for this paper, that solves the IMO problems with Gemini 2.5 Pro, has been released. https://t.co/m3YpvpF8hy
At the International Mathematical Olympiad, 26 students got higher scores than DeepMind and OpenAI models, possibly the last time humans beat AI at the exam (@bzcohen / Wall Street Journal) https://t.co/OMNYxrl5Pu https://t.co/HSm2wMaZRm https://t.co/ZOzeer2dpR
The world's smartest AI models just won gold at the math Olympics. They still got beat by the world's brightest high-schoolers. https://t.co/jjSkRDZT1E