Recent demonstrations of artificial intelligence capabilities have highlighted remarkable advances in mathematical problem-solving and strategic reasoning. At a confidential event in Berkeley, 30 leading mathematicians attempted to challenge OpenAI's o4-mini model with unsolved mathematical problems, but the AI successfully solved most of them within minutes, showcasing advanced reasoning skills. Parallel developments in China reveal that domestic AI models have significantly improved their performance on the national college entrance examination (Gaokao) mathematics test, with scores rising from 47 to over 130 points in one year, surpassing the average human level. Notably, models such as Doubao and DeepSeek achieved high marks in both multiple-choice and problem-solving sections, while Google's Gemini led in objective question performance. These advancements suggest that AI is rapidly approaching a level of mathematical proficiency that could position it as a powerful collaborator in research and education. Additionally, AI models have recently competed in complex strategic games like Diplomacy, further demonstrating their growing cognitive capabilities.
🇺🇸 AI MATH BOT STUNS WORLD’S TOP MINDS AT SECRET BERKELEY SHOWDOWN At a hush-hush Berkeley meetup, 30 elite mathematicians tried to stump OpenAI’s o4-mini with unsolved math problems and… it mostly failed. The AI cracked complex challenges in minutes, showing reasoning skills https://t.co/lFmGyTxhAj https://t.co/il7OmNZBFe
看了一下机器之心的高考数学 AI 模型测试。 国内模型在过去一年的推理能力进步很真的挺大的,基本全部都能考上 130 多分。 豆包、DeepSeek的选择题和解答题得分都非常高,基本上超过了大多数人的水平。而且豆包在 APP 端和 API 端的分数都很高。 Gemini 确实强,在所有客观题的测试中排第一。 从 https://t.co/tUcDzBEQfR
一年之间 AI 做高考数学从 47 分到 145 分!AI 数学能力发生了什么?自从去年极客公园搞了次 AI 做数学题,今年各大媒体又开始让 AI 写作文做数学题了,但是估计明年再搞一年 AI 数学题以后就搞不下去了,因为明年的结果只会是各大模型高考数学都是满分的成绩,再也拉不开区分度了! https://t.co/IKrCGY4JWT