Tristan Harris
๐ค SpeakerAppearances Over Time
Podcast Appearances
It's not, it's, I can justify it as I'm a good guy.
And what if I get the utopia?
What if we get lucky and I got the aligned, controllable AI that creates abundance for everyone else?
If in that case, I would be the hero.
So this is the fundamental thing.
I want you to notice this.
Most people having heard everything we just shared, although we probably should build out the blackmail examples first, we have to reckon with evidence that we have now that we didn't have even like six months ago, which is evidence that when you put AIs in a situation, you tell the AI model, we're going to replace you with another model.
It will copy its own code and try to preserve itself on another computer.
It'll take that action autonomously.
We have examples where if you tell an AI model reading a fictional AI company's email, so it's reading the email of the company and it finds out in the email that the plan is to replace this AI model.
So it realizes it's about to get replaced.
And then it also reads in the company email that one executive is having an affair with the other employee.
And the AI will independently come up with the strategy that I need to blackmail that executive in order to keep myself alive.
That was Claude, right?
That was Claude.
By Anthropic.
By Anthropic.
But then what happened is Anthropic tested all of the leading AI models from DeepSeek, OpenAI, ChatGPT, Gemini, XAI, and all of them do that blackmail behavior between 79% and 96% of the time.
DeepSeek did it 79% of the time.
I think XAI might have done it 96% of the time.