Join Us Monday, August 18

Claude isn’t here for your toxic conversations.

In a blog post on Saturday, Anthropic said it recently gave some of its AI models — Opus 4 and 4.1 — the ability to end a “rare subset” of conversations.

The startup said this applies only to “extreme cases,” such as requests for sexual content involving minors or instructions for mass violence, where Claude has already refused and tried to steer things back multiple times. It did not specify when the change went into effect.

It’s not ghosting. Anthropic said users will see a notice when the conversation is terminated, and they can still start a new chat or branch off from old messages — but the specific thread is done.

Most people will never see Claude walk away, Anthropic said: “The vast majority of users will not notice or be affected by this feature in any normal product use, even when discussing highly controversial issues.”

The startup also said Claude won’t end chats in situations where users may be at imminent risk of harming themselves or others.

Anthropic, which has positioned itself as the safety-first rival to OpenAI, said this feature was developed as part of its work on potential “AI welfare” — a concept that extends safety considerations to the AI itself.

Anthropic was founded by former OpenAI staffers who left in 2020 after disagreements on AI safety.

“Allowing models to end or exit potentially distressing interactions is one such intervention,” it added.

Anthropic did not respond to a request for comment from Business Insider.

Big Tech in the red

Anthropic’s move comes as some Big Tech firms face heat for letting extreme behavior slip through their AI safety nets.

Meta is under scrutiny after Reuters reported that internal documents showed its chatbots were allowed to engage in “sensual” chats with children.

A Meta spokesman told Reuters the company is in the process of revising the document and that such interactions should never have been allowed.

Elon Musk’s Grok made headlines last month after praising Hitler’s leadership and linking Jewish-sounding surnames to “anti-white hate.”

xAI apologized for Grok’s inflammatory posts and said it was caused by new instructions for the chatbot.

Anthropic hasn’t been spotless either.

In May, the company said that during training, Claude Opus 4 threatened to expose an engineer’s affair to avoid being shut down. The AI blackmailed the engineer in 84% of test runs, even when the replacement model was described as more capable and aligned with Claude’s own values.



Read the full article here

Share.
Leave A Reply

Exit mobile version