"OpenAI has tested AI internally that scored over 90% on a MATH dataset": https://t.co/Ew5UTowrJT
90% on MATH is definitely impressive but I still won't jump to conclusions about performance on other tasks based on this.. https://t.co/UcsBRgeObv
Over 90% on math is insane. I still believe it’s Q* or an even upgraded Q* algorithm. OpenAI is cooking. https://t.co/QGDEzNug0y


OpenAI has internally tested an AI system that scored over 90% on a MATH dataset, which consists of championship-level math problems. This benchmark achievement is significant, as it highlights the AI's capability in handling complex mathematical tasks. However, there is skepticism about its performance on other practical tasks, as a new benchmark called QUAKE revealed that frontier AI models score only 28% on practical tasks despite achieving over 80% on standard evaluations. The timeline for the public release of this advanced AI system remains uncertain, with some predictions suggesting it could be available by 2026. It remains unclear if this AI system is part of the 'Strawberry' project.