Dec 22, 07:20 AM

OpenAI's o3 Achieves 25% Accuracy on FrontierMath Benchmark; GPT-4 Only 2%; FineMath Dataset Released

OpenAI's model, o3, has achieved a notable milestone by scoring 25% accuracy on the challenging FrontierMath benchmark, which was designed to test advanced mathematical problem-solving. This benchmark is considered extremely difficult, with reports indicating that even GPT-4 only managed to solve less than 2% of the problems. The FrontierMath problems were crafted by over 60 professional mathematicians, including math professors, and are not included in any training datasets. In response to the challenges posed by FrontierMath, a new suite of math problems called Tier 4 has been announced, aiming to exceed the difficulty of the existing problems. Additionally, the FineMath dataset has been released, which is expected to provide a robust resource for training AI models in mathematical reasoning.

#FrontierMath #FineMath

Written with ChatGPT (GPT-4o mini).