Sholto Douglas
๐ค SpeakerAppearances Over Time
Podcast Appearances
And I just close it and copy-paste what I wanted from the thing.
And it would be very bad to misinterpret that as a bad example or a bad signal, because you're pretty much all the way there.
The system prompt always gets fucked with it.
It's always very cognizant of it.
I think it's not make fake unit test, but it's get the reward.
Yeah.
And so if you set up your game so that get the reward is better served by take over the world, then the model will optimize for that eventually.
Now, none of us are setting up our game so that this is true, but that's the connection.
And we're starting with unit tests now.
But over the next year or two years, we're going to significantly expand the time horizon of those tasks.
And it might be like achieve some goal.
I mean, God, like make money on the internet or something like this.
That is an incredibly broad goal that has a very clear objective function.
So it's actually in some ways a good RL task once you're at that level of capability.
But it's also one that has incredible scope for misalignment, let's say.
MARK MANDELMANN- Totally.
But people have done that with like the Constitution of the U.S.
government, right?
The U.S.
government is, I think, it's a better analogy in some respects of like this body that has goals and like can act on the world as opposed to like an amorphous force like the Industrial Revolution.