Eliezer Yudkowsky
๐ค SpeakerAppearances Over Time
Podcast Appearances
And this is not a law of reality that they know about.
They do not know that when you compress something, when you compress air or like coolant, it gets hotter and then you can then like transfer heat from it to room temperature air.
and then expand it again, and now it's colder, and then you can transfer heat to that and generate cold air to blow out.
They don't know about any of that.
They're looking at a design, and they don't see how the design outputs cold air.
It uses aspects of reality that they have not learned.
So magic in this sense is I can tell you exactly what I'm going to do, and even knowing exactly what I'm going to do, you can't see how I got the results that I got.
That's a really nice example.
Even now, GPT-4, is it lying to you?
Is it using an invalid argument?
Is it persuading you via the kind of process that could persuade you of false things as well as true things?
because the basic paradigm of machine learning that we are presently operating under is that you can have the loss function, but only for things you can evaluate.
If what you're evaluating is human thumbs up versus human thumbs down, you learn how to make the human press thumbs up.
That doesn't mean that you're making the human press thumbs up using the kind of rule that the human thinks is, human wants to be the case for what they press thumbs up on.
You know, maybe you're just learning to fool the human.
That's so fascinating and terrifying, the question of lying.
On the present paradigm, what you can verify is what you get more of.
If you can't verify it, you can't ask the AI for it.
because you can't train it to do things that you cannot verify.
Now, this is not an absolute law, but it's like the basic dilemma here.