Apr 17, 03:33 PM

OpenAI’s o3 Model Scores 80% on Aider Benchmark, Tops SimpleBench, Achieves 100% at 120k Tokens with New Aider Support

OpenAI's o3 (high) model has achieved new state-of-the-art (SOTA) performance on multiple coding and AI benchmarks. It scored 80% on the aider polyglot coding benchmark, surpassing previous models, while the o4-mini (high) scored 72%. When combined with GPT-4.1 as an editor, the o3-high architect model reached an 83% score on the aider benchmark, also reducing costs compared to using o3-high alone. Additionally, the o3-high model secured the top position on the SimpleBench benchmark, outperforming Gemini 2.5 Pro by nearly 2% and improving 13% over OpenAI's earlier o1-high model. The o3 model demonstrated exceptional capability in handling long contexts, achieving a perfect 100% score at 120,000 tokens. It continues to lead across various benchmarks including LiveBench and FictionBench, reinforcing OpenAI's competitive edge in AI model quality and efficiency. The latest Aider version 0.82.1 supports both o3 and o4-mini models.

#OpenAI #SimpleBench #Gemini #Pro #LiveBench #FictionBench #Aider

Written with ChatGPT (GPT-4).