Eliezer Yudkowsky
๐ค SpeakerAppearances Over Time
Podcast Appearances
I know what you hope, but...
You know, you can hope that a particular set of winning lottery numbers come up, and it doesn't make the lottery balls come up that way.
I know you want this to be true, but why would it be true?
This is a science problem.
We are trying to predict what happens with AI systems that...
You tried to optimize to imitate humans, and then you did some RLHF to them, and of course you didn't get perfect alignment because that's not what happens when you hill climb towards an outer loss function.
You don't get inner alignment on it.
But yeah, so...
I think that there is... So if you don't mind my taking some slight control of things and steering around to what I think is a good place to start...
I just failed to solve the control problem.
I've lost control of this thing.
Alignment, alignment.
Still aligned.
Control, yeah.
Okay, sure, yeah, you lost control.
But we're still aligned.
Yeah, losing control isn't as bad as you lose control to an aligned system.
Yes, exactly, exactly.
You have no idea of the horrors I will shortly unleash on this conversation.
So I think that there's like a Selen chapterist here, if I'm pronouncing those words remotely like correctly, because of course I only ever read them and not hear them spoken.