OpenAI’s newly released GPT-5 model is drawing mixed reviews from software developers who say the system excels at technical reasoning and project planning but lags rivals on raw coding accuracy. Early users interviewed by WIRED and discussing their tests online report that Anthropic’s latest Claude Opus 4.1 and Claude Sonnet 4 continue to generate cleaner, more reliable code. Cost is emerging as GPT-5’s principal advantage. Sayash Kapoor, a Princeton University researcher benchmarking large language models, says a standard SWE-bench test costs about $30 to run with GPT-5 set to medium verbosity, compared with roughly $400 for the same test on Claude Opus 4.1. Yet in Kapoor’s trials GPT-5 reproduced results from scientific papers only 27 % of the time, versus 51 % for Opus. Some engineers praise GPT-5’s ability to digest complex briefs and return end-to-end solutions in a single pass, but others criticise its tendency to generate redundant code and hallucinate details such as URLs. Anthropic argues that real-world performance depends on outcome-based pricing, noting that highly deliberative models can quickly consume tokens. The early feedback underscores a trade-off developers face: lower operating costs with GPT-5 versus higher accuracy from competing models.
GPT-5 is a really good model for coding and also much cheaper compared to Anthropic Opus 4.1. If Anthropic doesn’t find a way to lower their prices, people will easily switch to GPT-5 without a doubt. https://t.co/RIduH6wyPq https://t.co/qkOMoXaDns
Thoughts on my second test with GPT-5 vs Sonnet-4 🤔 Task was to implement “Gemini multi speaker speech API” Neither model knew how from training data GPT-5 excels at searching for and learning from new information ✅ whereas Sonnet-4 fails at this unfortunately 😔(2 tests) https://t.co/ptzExXVzoA https://t.co/w2goXwt8sc
Thoughts on my second test with GPT-5 vs Sonnet 🤔 Task was to implement “Gemini multi speaker speech API” Neither model knew how from training data GPT-5 excels at searching for and learning from new information ✅ whereas Sonnet-4 fails at this unfortunately 😔(2 tests) https://t.co/QeHvD2fA0Z https://t.co/w2goXwt8sc