
Anthropic Releases Claude Opus 4.1 With 74.5% SWE-Bench Score, Outperforming OpenAI o3 and Gemini 2.5 Pro
Anthropic has released Claude Opus 4.1, an upgrade to its flagship AI model Claude Opus 4, focusing on enhanced performance in agentic tasks, real-world coding, and complex reasoning. The update, available to paid users via Claude Code, API, Amazon Bedrock, and Google Cloud's Vertex AI at no additional cost, delivers a one standard deviation improvement over its predecessor. Claude Opus 4.1 achieves a coding performance score of 74.5% on the SWE-bench Verified benchmark, surpassing the previous 72.5% and outperforming competitors such as OpenAI's o3, Gemini 2.5 Pro, and Qwen-3 Coder in coding and agentic tasks. Key strengths include multi-file code refactoring, debugging, analytics, and improved context understanding for more accurate and helpful responses. This update marks Anthropic's quickest upgrade cycle, arriving just two months after Opus 4, with further substantial improvements expected in the coming weeks. The model is integrated into platforms like Poe and has been praised for its solid coding capabilities and continuous delivery of enhancements through Claude Code.
Sources
- Robin Ebers
Claude Code in two words ROCK. SOLID. That's it. That's the post. While some hype, Anthropic ships Non-stop.
- Haider.
unfortunately, with each model release, i've earned the badge of "lab Fanboy" > tested Gemini 2.5 pro for a week -- incredible at coding and instructions following > spent $100 for Claude Code -- excellent at agentic coding > tried Grok 4 in API -- amazing at creative writing,
- Luc Pimentel
As much as I like Claude Code, I am back to using cursor... Dont like the UX of overseeing diffs through the CLI. Noticed my code quality drops a lot because I cant be bothered to check the new code and just YOLO them. Good thing is Cursor agents are working really well now.
Additional media















