OpenAI's o3-mini-high model has emerged as the leading choice for coding tasks, outperforming competitors such as DeepSeek R1, o1, and Claude Sonnet 3.5 in various benchmarks. On LiveBench, o3-mini-high achieved a coding average score of 82.74, significantly higher than o1's 69.69, Claude 3.5 Sonnet's 67.13, and DeepSeek R1's 66.74. The model's performance, combined with its speed and cost-effectiveness—being approximately 2 times cheaper than Sonnet and 15 times cheaper than o1, while also being about 5 times faster than other models—is expected to shift coding workloads towards o3-mini-high.
o3-mini and o3-pro imply the existence of o3-pro-max
Currently, the o3-mini-high model is the optimal choice for my coding at the backend, surpassing both the DeepSeek R1 and the o1-pro models in performance.
Ran this eval again with o3 mini high and it sets a clear SOTA of 32/38 https://t.co/9MT00sJpWt https://t.co/mkY4GvL8iV