May 16, 03:49 PM

OpenAI Releases Codex Powered by Codex-1 Model With 72.1% SWE-Bench Score, Outperforming o3 and Sonnet 3.7 in 2025

OpenAI has released Codex, powered by Codex-1, a fine-tuned version of the o3 model. Codex-1 has demonstrated improved performance on the SWE-bench verified coding benchmark, achieving a score of 72.1%, surpassing the previous o3 model's 71.7% score from December. At pass@8, Codex-1 reaches an accuracy of 83.86%, outperforming Sonnet 3.7 Thinking, which scored 70.3%. The release distinguishes Codex-1 from earlier models named Codex (2021) and the Codex CLI tool released last month. OpenAI has shared the Codex-1 system message to assist developers in understanding the model's default behavior and customizing it for specific workflows. The improvements in Codex-1 represent a notable advancement in OpenAI's software engineering benchmarks.

#OpenAI #Codex #Codex CLI

Written with ChatGPT (GPT-4).