Eliezer Yudkowsky
๐ค SpeakerAppearances Over Time
Podcast Appearances
Because it's not random, but it also doesn't necessarily have room for humans in it.
I suspect that the average member of the audience might have some questions about even whether that's the correct paradigm to think about it and would sort of want to back up a bit, possibly.
Why?
I know what you hope, but...
You know, you can hope that a particular set of winning lottery numbers come up, and it doesn't make the lottery balls come up that way.
I know you want this to be true, but why would it be true?
This is a science problem.
We are trying to predict what happens with AI systems that...
You tried to optimize to imitate humans, and then you did some RLHF to them, and of course you didn't get perfect alignment because that's not what happens when you hill climb towards an outer loss function.
You don't get inner alignment on it.
But yeah, so...
I think that there is... So if you don't mind my taking some slight control of things and steering around to what I think is a good place to start...
I just failed to solve the control problem.
I've lost control of this thing.
Alignment, alignment.
Still aligned.
Control, yeah.
Okay, sure, yeah, you lost control.
But we're still aligned.
Yeah, losing control isn't as bad as you lose control to an aligned system.