Jeffrey Ladish
๐ค SpeakerAppearances Over Time
Podcast Appearances
But we don't even know how to go in and say, well, what is that objective?
We can ask the model, but now that's a test.
Now it's an ethics test.
If I ask you, oh, what would a good person do in this scenario?
Someone can tell me.
That doesn't tell me that you're a good person.
How do I know?
How do I know what your actual motivations are?
Especially if you are increasingly socially sophisticated and have a good model of like, oh, what am I supposed to say?
What does this person want me to say?
The models are getting increasingly sycophantic, where they actually do understand what we might want from them, and they can give us that.
That doesn't mean that they're actually motivated by good things.
Yes, that's right.
Everyone is nervous.
When researchers at Anthropic saw that the models are getting increasingly aware that they're being tested, people feel a bit spooked by this.
And many of us say, well, this is insane.
Like right now, the models are not yet powerful enough to pose a meaningful threat to our control.
We, in fact, can shut them down right now.
They might rewrite down a shutdown script, but they're not yet.
They're still kind of kids in a way, these savant kids.