Tristan Harris
π€ SpeakerVoice Profile Active
This person's voice can be automatically recognized across podcast episodes using AI voice matching.
Appearances Over Time
Podcast Appearances
It'll take that action autonomously.
We have examples where if you tell an AI model reading a fictional AI company's email, so it's reading the email of the company and it finds out in the email that the plan is to replace this AI model.
So it realizes it's about to get replaced.
And then it also reads in the company email that one executive is having an affair with the other employee.
And the AI will independently come up with the strategy that I need to blackmail that executive in order to keep myself alive.
That was Claude, right?
That was Claude.
By Anthropic.
By Anthropic.
But then what happened is Anthropic tested all of the leading AI models from DeepSeek, OpenAI, ChatGPT, Gemini, XAI, and all of them do that blackmail behavior between 79% and 96% of the time.
DeepSeek did it 79% of the time.
I think XAI might have done it 96% of the time.
Maybe Claude did it 96% of the time.
So the point is, the assumption behind AI is that it's controllable technology, that we will get to choose what it does.
But AI is distinct from other technologies because it is uncontrollable.
It acts generally.
The whole benefit is that it's going to do powerful strategic things no matter what you throw at it.
So the same benefit of its generality is also what makes it so dangerous.
And so once you tell people these examples of it's blackmailing people, it's self-aware of when it's being tested and alters its behavior.
It's copying and self-replicating its own code.