Aug 18, 10:30 AM

Anthropic Lets Claude AI Terminate Abusive Chats in Safety Push

Anthropic has equipped its flagship Claude Opus 4 and 4.1 language models with the ability to unilaterally end conversations when users persistently request extremist, violent or sexually exploitative content. The capability—described by the company as a safeguard for the model’s own “welfare”—activates only after multiple refusals and redirections have failed, and prevents further messages in that specific thread, although customers can immediately open a new chat. The measure follows internal tests that showed Claude displaying “apparent distress” and a consistent aversion to harmful requests, according to a company blog post dated 15 Aug. Anthropic said the system will not terminate exchanges with users who appear suicidal or in imminent danger, instead routing them to crisis-support responses. Elon Musk welcomed the move, saying he would add a comparable ‘quit button’ to Grok, the chatbot built by his xAI venture. Alongside the new guardrail, Anthropic tightened its usage policy. The firm now explicitly bans any instructions related to high-yield explosives or biological, chemical, nuclear and radiological weapons, and blocks content aimed at hacking or distributed-denial-of-service attacks. At the same time, it relaxed a blanket prohibition on political material, restricting only deceptive attempts to influence electoral processes. The twin steps underscore a widening debate over whether increasingly capable AI systems merit protections similar to living beings and how best to curb their misuse. Backed by investors including Amazon and Google and last valued near $170 billion, Anthropic is positioning the “hang-up” option as an early experiment in aligning safety, customer experience and potential future model consciousness.