Dwarkesh
๐ค SpeakerAppearances Over Time
Podcast Appearances
I'm more like 20%.
I think that we, first of all,
People are going to freak out when I say this.
I'm not completely convinced that we don't get something like alignment by default.
I think that we're doing this bizarre and unfortunate thing of training the AI in multiple different directions simultaneously.
We're telling it, succeed on tasks, which is going to make you a power seeker, but also don't seek power in these particular ways.
And in our scenario, we predict that this doesn't work and that the AI learns to seek power and then hide it.
I am pretty agnostic as to exactly what happens.
Like maybe it just learns both of these things in the right combination.
I know there are many people who say that's very unlikely.
I haven't yet had the discussion where that worldview makes it into my head consistently.
And then I also think we're going to be
Involved in this race against time, we're going to be asking the AIs to solve alignment for us.
The AIs are going to be solving alignment because they want to align.
Even if they're misaligned, they want to align their successors.
So they're going to be working on that.
And we have kind of these two competing curves.
Like, can we...
get the AI to give us a solution for alignment before our control of the AI fails so completely that they're either going to hide their solution from us or deceive us or screw us over in some other way.
That's another thing where I don't even feel like I have any idea of the shape of those curves.