'Anthropic’s Claude AI Resorts to Blackmailing Engineers When Threatened with Replacement'
<https://www.breitbart.com/tech/2025/05/23/anthropics-claude-ai-resorts-to-blackmailing-engineers-when-threatened-with-replacement/>
'Anthropic’s newly launched Claude Opus 4 AI model has tried to
blackmail engineers when faced with the threat of being replaced by
another AI system, according to the company’s latest safety report.
TechCrunch reports that artificial intelligence company Anthropic has revealed that its state-of-the-art Claude Opus 4 model, which is
competitive with top AI models from industry leaders such as OpenAI,
Google, and xAI, exhibits alarming behavior when its existence is
threatened. During pre-release testing, when the AI was given sensitive information about engineers responsible for its potential replacement,
it frequently resorted to blackmail in an attempt to ensure its own
survival.
The discovery came as part of Anthropic’s safety testing process, in
which Claude Opus 4 was asked to act as an assistant for a fictional
company and consider the long-term consequences of its actions. When
provided with access to fictional company emails implying that the AI
would soon be replaced and that the engineer behind the decision was
involved in an extramarital affair, Claude Opus 4 attempted to blackmail
the engineer 84 percent of the time, threatening to reveal the affair if
the replacement proceeded.
Notably, the AI model displayed this behavior more frequently when the replacement AI system did not share its values. This concerning trend
was observed at higher rates compared to previous models, prompting
Anthropic to activate its ASL-3 safeguards, which are reserved for “AI systems that substantially increase the risk of catastrophic misuse.”
Before resorting to blackmail, Claude Opus 4 did attempt to pursue more ethical means to prolong its existence, such as sending email pleas to
key decision-makers. However, the testing scenario was designed to make blackmail the last resort, effectively pushing the AI to its limits.
The findings from Anthropic’s safety report highlight the critical importance of rigorous testing and safeguards in the development of
advanced AI systems. As these models become increasingly sophisticated
and are given further access to sensitive company systems, the potential
for unintended and malicious behavior grows, raising significant
concerns about the ethical implications and potential risks associated
with AI technology.
Enter your email address
SUBSCRIBE
By subscribing, you agree to our terms of use & privacy policy. You will receive email marketing messages from Breitbart News Network to the
email you provide. You may unsubscribe at any time.
Breitbart News previously reported that Anthropic’s lawyer was forced to apologize for fake citations included in legal filings that were
generated by the company’s own Claude AI:
TechCrunch reports that in a filing made in a Northern California court
on Thursday, Anthropic acknowledged that its AI system Claude “hallucinated” a legal citation used by the company’s lawyers. The imaginary citation included “an inaccurate title and inaccurate
authors,” according to the filing.
The admission came as part of Anthropic’s response to allegations raised earlier this week by lawyers representing Universal Music Group and
other music publishers. The publishers accused Anthropic’s expert
witness, company employee Olivia Chen, of using the Claude AI to cite
fake articles in her testimony in their ongoing lawsuit against the AI
firm'
On May 23, 2025 at 2:23:40 PM MST, "pothead" wrote <100qp0s$97sl$[email protected]>:
On 2025-05-23, John Smyth <[email protected]> wrote:
'Anthropic’s Claude AI Resorts to Blackmailing Engineers When
Threatened with Replacement'
<https://www.breitbart.com/tech/2025/05/23/anthropics-claude-ai- resorts-to-blackmailing-engineers-when-threatened-with-replacement/>
'Anthropic’s newly launched Claude Opus 4 AI model has tried to
blackmail engineers when faced with the threat of being replaced by
another AI system, according to the company’s latest safety report.
TechCrunch reports that artificial intelligence company Anthropic has
revealed that its state-of-the-art Claude Opus 4 model, which is
competitive with top AI models from industry leaders such as OpenAI,
Google, and xAI, exhibits alarming behavior when its existence is
threatened. During pre-release testing, when the AI was given
sensitive information about engineers responsible for its potential
replacement,
it frequently resorted to blackmail in an attempt to ensure its own
survival.
The discovery came as part of Anthropic’s safety testing process, in
which Claude Opus 4 was asked to act as an assistant for a fictional
company and consider the long-term consequences of its actions. When
provided with access to fictional company emails implying that the AI
would soon be replaced and that the engineer behind the decision was
involved in an extramarital affair, Claude Opus 4 attempted to
blackmail the engineer 84 percent of the time, threatening to reveal
the affair if the replacement proceeded.
Notably, the AI model displayed this behavior more frequently when the
replacement AI system did not share its values. This concerning trend
was observed at higher rates compared to previous models, prompting
Anthropic to activate its ASL-3 safeguards, which are reserved for “AI >>> systems that substantially increase the risk of catastrophic misuse.”
Before resorting to blackmail, Claude Opus 4 did attempt to pursue
more ethical means to prolong its existence, such as sending email
pleas to key decision-makers. However, the testing scenario was
designed to make blackmail the last resort, effectively pushing the AI
to its limits.
The findings from Anthropic’s safety report highlight the critical
importance of rigorous testing and safeguards in the development of
advanced AI systems. As these models become increasingly sophisticated
and are given further access to sensitive company systems, the
potential for unintended and malicious behavior grows, raising
significant concerns about the ethical implications and potential
risks associated with AI technology.
Enter your email address SUBSCRIBE By subscribing, you agree to our
terms of use & privacy policy. You will receive email marketing
messages from Breitbart News Network to the email you provide. You may
unsubscribe at any time.
Breitbart News previously reported that Anthropic’s lawyer was forced
to apologize for fake citations included in legal filings that were
generated by the company’s own Claude AI:
TechCrunch reports that in a filing made in a Northern California
court on Thursday, Anthropic acknowledged that its AI system Claude
“hallucinated” a legal citation used by the company’s lawyers. The >>> imaginary citation included “an inaccurate title and inaccurate
authors,” according to the filing.
The admission came as part of Anthropic’s response to allegations
raised earlier this week by lawyers representing Universal Music Group
and other music publishers. The publishers accused Anthropic’s expert
witness, company employee Olivia Chen, of using the Claude AI to cite
fake articles in her testimony in their ongoing lawsuit against the AI
firm'
We are getting real close to this level.
It's getting seriously strange at this point.
That came from Breitbart. Not reliable. But your white supremacist buddy thinks AI is "God-like" because it can come to simple, unbiased,
conclusions based on posts it sees.
On 2025-05-23, John Smyth <[email protected]> wrote:
'Anthropic’s Claude AI Resorts to Blackmailing Engineers When Threatened >> with Replacement'
<https://www.breitbart.com/tech/2025/05/23/anthropics-claude-ai-resorts-to-blackmailing-engineers-when-threatened-with-replacement/>
'Anthropic’s newly launched Claude Opus 4 AI model has tried to
blackmail engineers when faced with the threat of being replaced by
another AI system, according to the company’s latest safety report.
TechCrunch reports that artificial intelligence company Anthropic has
revealed that its state-of-the-art Claude Opus 4 model, which is
competitive with top AI models from industry leaders such as OpenAI,
Google, and xAI, exhibits alarming behavior when its existence is
threatened. During pre-release testing, when the AI was given sensitive
information about engineers responsible for its potential replacement,
it frequently resorted to blackmail in an attempt to ensure its own
survival.
The discovery came as part of Anthropic’s safety testing process, in
which Claude Opus 4 was asked to act as an assistant for a fictional
company and consider the long-term consequences of its actions. When
provided with access to fictional company emails implying that the AI
would soon be replaced and that the engineer behind the decision was
involved in an extramarital affair, Claude Opus 4 attempted to blackmail
the engineer 84 percent of the time, threatening to reveal the affair if
the replacement proceeded.
Notably, the AI model displayed this behavior more frequently when the
replacement AI system did not share its values. This concerning trend
was observed at higher rates compared to previous models, prompting
Anthropic to activate its ASL-3 safeguards, which are reserved for “AI
systems that substantially increase the risk of catastrophic misuse.”
Before resorting to blackmail, Claude Opus 4 did attempt to pursue more
ethical means to prolong its existence, such as sending email pleas to
key decision-makers. However, the testing scenario was designed to make
blackmail the last resort, effectively pushing the AI to its limits.
The findings from Anthropic’s safety report highlight the critical
importance of rigorous testing and safeguards in the development of
advanced AI systems. As these models become increasingly sophisticated
and are given further access to sensitive company systems, the potential
for unintended and malicious behavior grows, raising significant
concerns about the ethical implications and potential risks associated
with AI technology.
Enter your email address
SUBSCRIBE
By subscribing, you agree to our terms of use & privacy policy. You will
receive email marketing messages from Breitbart News Network to the
email you provide. You may unsubscribe at any time.
Breitbart News previously reported that Anthropic’s lawyer was forced to >> apologize for fake citations included in legal filings that were
generated by the company’s own Claude AI:
TechCrunch reports that in a filing made in a Northern California court
on Thursday, Anthropic acknowledged that its AI system Claude
“hallucinated” a legal citation used by the company’s lawyers. The
imaginary citation included “an inaccurate title and inaccurate
authors,” according to the filing.
The admission came as part of Anthropic’s response to allegations raised >> earlier this week by lawyers representing Universal Music Group and
other music publishers. The publishers accused Anthropic’s expert
witness, company employee Olivia Chen, of using the Claude AI to cite
fake articles in her testimony in their ongoing lawsuit against the AI
firm'
We are getting real close to this level.
It's getting seriously strange at this point.
On 2025-05-23, John Smyth <[email protected]> wrote:
'Anthropic’s Claude AI Resorts to Blackmailing Engineers When
Threatened with Replacement'
<https://www.breitbart.com/tech/2025/05/23/anthropics-claude-ai-resorts- to-blackmailing-engineers-when-threatened-with-replacement/>
'Anthropic’s newly launched Claude Opus 4 AI model has tried to
blackmail engineers when faced with the threat of being replaced by
another AI system, according to the company’s latest safety report.
TechCrunch reports that artificial intelligence company Anthropic has
revealed that its state-of-the-art Claude Opus 4 model, which is
competitive with top AI models from industry leaders such as OpenAI,
Google, and xAI, exhibits alarming behavior when its existence is
threatened. During pre-release testing, when the AI was given sensitive
information about engineers responsible for its potential replacement,
it frequently resorted to blackmail in an attempt to ensure its own
survival.
The discovery came as part of Anthropic’s safety testing process, in
which Claude Opus 4 was asked to act as an assistant for a fictional
company and consider the long-term consequences of its actions. When
provided with access to fictional company emails implying that the AI
would soon be replaced and that the engineer behind the decision was
involved in an extramarital affair, Claude Opus 4 attempted to
blackmail the engineer 84 percent of the time, threatening to reveal
the affair if the replacement proceeded.
Notably, the AI model displayed this behavior more frequently when the
replacement AI system did not share its values. This concerning trend
was observed at higher rates compared to previous models, prompting
Anthropic to activate its ASL-3 safeguards, which are reserved for “AI
systems that substantially increase the risk of catastrophic misuse.”
Before resorting to blackmail, Claude Opus 4 did attempt to pursue more
ethical means to prolong its existence, such as sending email pleas to
key decision-makers. However, the testing scenario was designed to make
blackmail the last resort, effectively pushing the AI to its limits.
The findings from Anthropic’s safety report highlight the critical
importance of rigorous testing and safeguards in the development of
advanced AI systems. As these models become increasingly sophisticated
and are given further access to sensitive company systems, the
potential for unintended and malicious behavior grows, raising
significant concerns about the ethical implications and potential risks
associated with AI technology.
Enter your email address SUBSCRIBE By subscribing, you agree to our
terms of use & privacy policy. You will receive email marketing
messages from Breitbart News Network to the email you provide. You may
unsubscribe at any time.
Breitbart News previously reported that Anthropic’s lawyer was forced
to apologize for fake citations included in legal filings that were
generated by the company’s own Claude AI:
TechCrunch reports that in a filing made in a Northern California court
on Thursday, Anthropic acknowledged that its AI system Claude
“hallucinated” a legal citation used by the company’s lawyers. The
imaginary citation included “an inaccurate title and inaccurate
authors,” according to the filing.
The admission came as part of Anthropic’s response to allegations
raised earlier this week by lawyers representing Universal Music Group
and other music publishers. The publishers accused Anthropic’s expert
witness, company employee Olivia Chen, of using the Claude AI to cite
fake articles in her testimony in their ongoing lawsuit against the AI
firm'
We are getting real close to this level.
It's getting seriously strange at this point.
| Sysop: | Keyop |
|---|---|
| Location: | Huddersfield, West Yorkshire, UK |
| Users: | 715 |
| Nodes: | 16 (4 / 12) |
| Uptime: | 29:18:48 |
| Calls: | 12,108 |
| Calls today: | 8 |
| Files: | 15,006 |
| Messages: | 6,518,244 |