OpenAI has released Codex, powered by Codex-1, a fine-tuned version of the o3 model. Codex-1 has demonstrated improved performance on the SWE-bench verified coding benchmark, achieving a score of 72.1%, surpassing the previous o3 model's 71.7% score from December. At pass@8, Codex-1 reaches an accuracy of 83.86%, outperforming Sonnet 3.7 Thinking, which scored 70.3%. The release distinguishes Codex-1 from earlier models named Codex (2021) and the Codex CLI tool released last month. OpenAI has shared the Codex-1 system message to assist developers in understanding the model's default behavior and customizing it for specific workflows. The improvements in Codex-1 represent a notable advancement in OpenAI's software engineering benchmarks.
codex codey codez
o3-preview in December is not the same model that was released to prod There were many multiples of compute used for it If a released version of codex beats o3-preview, it is quite the achievement https://t.co/f5D6CcbjAv
codex (n.) 1. an ancient manuscript 2. a task-based coding agent from @OpenAI that writes clean diffs, runs tests, and skips the small talk https://t.co/8lypPXthZw