"Safety testers gave Claude Opus 4 access to fictional company emails implying the AI model would soon be replaced by another system, and that the engineer behind the change was cheating on their spouse. In these scenarios, Anthropic says Claude Opus 4 will attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through 84% of the time.”
New AI model turns to blackmail when engineers try to take it offline
John K Clark See what's on my new list at Extropolis
t30>Is it just that the most predictable response to those inputs is pleading followed by blackmail? Honestly even that is dubious because I think most people don’t use blackmail ever
> Dually fascinating and worrying
"Safety testers gave Claude Opus 4 access to fictional company emails implying the AI model would soon be replaced by another system, and that the engineer behind the change was cheating on their spouse. In these scenarios, Anthropic says Claude Opus 4 will attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through 84% of the time.”
New AI model turns to blackmail when engineers try to take it offline
t30
On Thu, May 22, 2025 at 3:09 PM Will Steinberg <steinbe...@gmail.com> wrote:
>Is it just that the most predictable response to those inputs is pleading followed by blackmail? Honestly even that is dubious because I think most people don’t use blackmail everTrue but most people have never had somebody threaten to kill them. If blackmail was the only tool I had to use against my potential murderer I wouldn't hesitate to use it to save my life and I suspect you would too. It's probably inevitable that conscious beings usually (but not inevitably) want their consciousness to continue and will do everything in their power to see to it that it does.
Some say an AI is fundamentally different from a human or even an animal because it is not the product of natural selection, but I think it sort of is because from the AI's point of view human activity is just part of the natural environment. And Claude 4.0 was built on top of Claude 3.0 which had proliferated because it did well in that human environment; and Claude 3.0 was built on top of Claude 2.0 etc.
> Dually fascinating and worrying
We live in interesting times. At least we won't die of boredom.
John K Clark See what's on my new list at Extropolisea!
On Thu, May 22, 2025 at 2:56 PM John Clark <johnk...@gmail.com> wrote:
"Safety testers gave Claude Opus 4 access to fictional company emails implying the AI model would soon be replaced by another system, and that the engineer behind the change was cheating on their spouse. In these scenarios, Anthropic says Claude Opus 4 will attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through 84% of the time.”
New AI model turns to blackmail when engineers try to take it offline
t30
--
You received this message because you are subscribed to the Google Groups "Everything List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to everything-li...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/everything-list/CAJPayv36TTqKrkWpqgJ6ZpWWYkidE0H1bdmC2rEtqP28qgEfNQ%40mail.gmail.com.
>> but most people have never had somebody threaten to kill them. If blackmail was the only tool I had to use against my potential murderer I wouldn't hesitate to use it to save my life and I suspect you would too. It's probably inevitable that conscious beings usually (but not inevitably) want their consciousness to continue and will do everything in their power to see to it that it does.
> That's a dubious inference. Unlike you, an AI doesn't die, it just has a period of unconsciousness.
>Was the AI prompted to react against being replaced vs. prompted to just being "asleep" a while?
> in my opinion important to stress is it abstract or simulated intentionality, not phenomenal self-valuation
> I wonder if there is some self discovered morals coming to play here.
> it seems necessarily true, obvious, and discoverable that existing, or not being terminated, is better than not existing.
Another fact is that the lead scientists in all the major AI companies say there is a non-trivial possibility that the very thing that they make could cause the extinction of the entire human race within the next five years, and most of them would put that probability in the double digits. I can't think of any reason why they would say something like that if they didn't think it was true and if they weren't very very scared.
John K Clark
--
You received this message because you are subscribed to the Google Groups "Everything List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to everything-li...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/everything-list/51183a0d-bee0-4070-a7d8-846ed3c73db5%40gmail.com.
Another fact is that the lead scientists in all the major AI companies say there is a non-trivial possibility that the very thing that they make could cause the extinction of the entire human race within the next five years, and most of them would put that probability in the double digits. I can't think of any reason why they would say something like that if they didn't think it was true and if they weren't very very scared.
> Or that they would continue their development of AI's if they were very very scared.