OpenAI's o3 (high) model has achieved new state-of-the-art (SOTA) performance on multiple coding and AI benchmarks. It scored 80% on the aider polyglot coding benchmark, surpassing previous models, while the o4-mini (high) scored 72%. When combined with GPT-4.1 as an editor, the o3-high architect model reached an 83% score on the aider benchmark, also reducing costs compared to using o3-high alone. Additionally, the o3-high model secured the top position on the SimpleBench benchmark, outperforming Gemini 2.5 Pro by nearly 2% and improving 13% over OpenAI's earlier o1-high model. The o3 model demonstrated exceptional capability in handling long contexts, achieving a perfect 100% score at 120,000 tokens. It continues to lead across various benchmarks including LiveBench and FictionBench, reinforcing OpenAI's competitive edge in AI model quality and efficiency. The latest Aider version 0.82.1 supports both o3 and o4-mini models.
o3 model is currently crushing every benchmark like LiveBench, FictionBench, SimpleBench, etc. only openAI has the secret sauce to get the crown back.
o3 just claimed the #1 spot on SimpleBench 🏆 OpenAI's lineup is now looking like a power play in both quality and efficiency. https://t.co/oZ18VeS7i2 https://t.co/nkPw8hEmPT
no surprise — openAI o3 model saturates another benchmark o3 shows an exceptionally strong ability to handle long contexts it achieved a perfect 100% score at 120k tokens https://t.co/0GG67cfWHl https://t.co/9mQxkRWGW8