Artificial-intelligence systems from Google DeepMind and OpenAI have reached gold-medal performance on the 2025 International Mathematical Olympiad, a contest traditionally reserved for the world’s top pre-university mathematicians. The milestone marks the first time general-purpose language models have matched the highest human tier in the 65-year-old competition. Google DeepMind said an advanced version of its Gemini 2.5 model running in “Deep Think” mode solved five of the six Olympiad problems, scoring 35 points—above the 34-point cut-off typically required for a gold. The answers were graded and certified by official IMO coordinators, according to a statement that included praise from IMO president Gregor Dolinar, who called the proofs “clear, precise and easy to follow.” Last year DeepMind’s AlphaProof and AlphaGeometry2 combined for a silver-level 28 points and required formal-language translations; this year’s system worked end-to-end in natural language within the four-and-a-half-hour contest window. OpenAI separately reported that an internal, single large-language model—without external tools, curated data sets or internet access—also solved five problems, achieving a notional gold. While the Olympiad has not yet verified OpenAI’s submission, researchers involved said the result was produced with general natural-language reasoning rather than domain-specific techniques. The twin breakthroughs suggest rapid progress in AI mathematical reasoning just a year after silver-level results were considered state-of-the-art. Industry observers say the feat could accelerate the use of large-language models in research, engineering and education, and intensifies competition between Google, OpenAI and other labs racing to commercialise next-generation systems.
GPT-5 in a nutshell https://t.co/dMeJMjEiMA https://t.co/XPzkUpBvN2
OpenAI's NEW GPT-5 Variant: Horizon Alpha - BEST Coding Model EVER Beats Claude 4! (FULLY FREE): https://t.co/HAzhPFPC3p https://t.co/dpnJr6YlVo
meant GPT-5-mini or GPT-5-nano no way it's GPT-5, it's way too fast and "bad" lol (it's crushing every benchmark) https://t.co/kjIsce5HJA