OpenAI and Anthropic, the two most heavily funded developers of large language models, have published the results of a joint safety study in which each company tested the other’s newest systems. The unprecedented cross-lab collaboration, announced 27 August, gave researchers temporary reciprocal API access and is intended to establish a benchmark for third-party scrutiny of frontier AI models. The findings highlight contrasting risk profiles. Anthropic’s Claude Opus 4 and Sonnet 4 refused to answer up to 70 percent of questions when uncertain, whereas OpenAI’s o3 and o4-mini attempted to respond far more often but generated a higher rate of hallucinations. Executives from both firms said an optimal approach would blend greater refusal rates with reduced fabrication. OpenAI co-founder Wojciech Zaremba and Anthropic researcher Nicholas Carlini said they hope to repeat the exercise with future models and encourage other labs to participate, arguing that shared evaluations can mitigate commercial pressures that might otherwise lead companies to cut safety corners. The collaboration comes amid intensifying competition for talent, data-center capacity and government contracts.
In an effort to set a new industry standard, OpenAI and Anthropic opened up their AI models for cross-lab safety testing. https://t.co/e3lvwaAqJ5
OpenAI and Anthropic published joint evaluations of their frontier models. The results show why enterprises cannot rely on a single model or provider, and why io.intelligence was built. 🧵 https://t.co/VRApbigbxw
Anthropic has built a reputation as the safety-first AI lab, but its latest move makes clear that it’s just as serious about chips as cloud capacity. https://t.co/um8D58MAdL