Aug 18, 06:20 AM

Anthropic Gives Claude Opus 4 Power to End Abusive Chats

Anthropic has equipped its latest Claude Opus 4 and 4.1 language models with the capacity to terminate conversations that become persistently harmful or abusive, marking one of the first consumer-facing safety functions that lets an AI system effectively “hang up” on users. In a blog post dated Aug. 17, the company said the feature will be triggered only in rare “extreme edge cases,” such as requests for sexual content involving minors or instructions for large-scale violence. The models will attempt several content redirections before ending a session; users can then open a new chat or edit the previous message. The safeguard is disabled when a user shows signs of self-harm so the system can continue offering assistance. Anthropic said the measure stems from its research into “AI welfare,” which explores whether advanced models may experience distress and how to mitigate it. The company previously pledged to delay commercial deployment of powerful systems until controls—including jailbreak prevention and expanded content filters—were in place. Claude Opus 4 entered the market three months ago with what Anthropic calls “AI Safety Level 3” protections. The initiative highlights mounting pressure on developers to curb malicious use of conversational AI as regulators weigh security risks. While rivals such as OpenAI and Google employ refusal mechanisms, Anthropic’s automatic termination of abusive chats pushes the boundary on how proactively models can police user interactions.

#Anthropic #OpenAI #Google

Written with ChatGPT .

Sources

Additional media

Image #1 for story anthropic-gives-claude-opus-4-power-to-end-abusive-chats-283f5698

Image #2 for story anthropic-gives-claude-opus-4-power-to-end-abusive-chats-283f5698

Image #3 for story anthropic-gives-claude-opus-4-power-to-end-abusive-chats-283f5698

Anthropic Gives Claude Opus 4 Power to End Abusive Chats

Sources

Additional media

Similar Stories

Similar Stories