Dwarkesh
๐ค SpeakerAppearances Over Time
Podcast Appearances
This was a mistake.
We're going to take it out.
So we kind of want more things like that to happen where people are looking at, like...
Here it was the prompt, but I think very soon it's going to be the spec where it's kind of more of an agent and it's understanding the spec in a deeper level.
And just thinking about that and being, and if it says like, by the way, try to manipulate the government into doing this or that, then we know that something bad has happened and
If it doesn't see that, then we can maybe trust it.
This is actually part of our misalignment story, is that if the AI is sufficiently misaligned, then yes, we can tell it it has to follow the spec.
But just as people with different views of the Constitution have managed to get it into a shape that probably the founders would not have recognized, so the AI will be able to say, well, the spec refers to the general welfare here.
I think we agree.
I think that's kind of why all of our policy prescriptions are things like more transparency, get more people involved, try to have lots of people working on this.
I think our epistemic prediction is that it's hard to maintain classical liberalism as you go into these really difficult arms races in times of crisis.
But I think that our policy prescription is let's try as hard as we can to make it happen.
Yeah, so I agree that the AIs are currently getting more reliable.
I think there are two reasons why they might fail to do what you want, kind of reflecting how they're trained.
One is that they're too stupid to understand their training.
The other is that you were too stupid to train them correctly, and they understood what you were doing exactly, but you messed it up.
So I think the first one is kind of what we're coming out of.
So GPT-3, if you asked it, are bugs real, it would give this kind of hemming-hawing answer like, oh, we can never truly tell what is real.
Who knows?
Because it was trained kind of don't take difficult political positions and a lot of questions like is X real or things like is God real where you don't want it to really answer that.