Lex Fridman Podcast
#368 โ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization
And if you can't tell whether the output is good or bad, you cannot train the AI to produce better outputs.
Lex Fridman Podcast
#368 โ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization
So the problem with the lottery ticket example is that when the AI says, well, what if next week's winning lottery numbers are dot, dot, dot, dot, dot, you're like, I don't know.
Lex Fridman Podcast
#368 โ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization
Next week's lottery hasn't happened yet.
Lex Fridman Podcast
#368 โ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization
To train a system to win chess games, you have to be able to tell whether a game has been won or lost.
Lex Fridman Podcast
#368 โ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization
And until you can tell whether it's been won or lost, you can't update the system.
Lex Fridman Podcast
#368 โ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization
So the problem I see...
Lex Fridman Podcast
#368 โ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization
is that your typical human has a great deal of trouble telling whether I or Paul Christiano is making more sense.
Lex Fridman Podcast
#368 โ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization
And that's with two humans, both of whom I believe of Paul and claim of myself, are sincerely trying to help, neither of whom is trying to deceive you.
Lex Fridman Podcast
#368 โ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization
I believe of Paul and claim of myself.
Lex Fridman Podcast
#368 โ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization
So the deception thing is the problem for you, the manipulation, the alien actress.
Lex Fridman Podcast
#368 โ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization
So yeah, there's like two levels of this problem.
Lex Fridman Podcast
#368 โ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization
One is that the weak systems are, well, there's three levels of this problem.
Lex Fridman Podcast
#368 โ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization
There's like the weak systems that just don't make any good suggestions.
Lex Fridman Podcast
#368 โ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization
There's like the middle systems where you can't tell if the suggestions are good or bad.
Lex Fridman Podcast
#368 โ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization
And there's the strong systems that have learned to lie to you.
Lex Fridman Podcast
#368 โ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization
I would love to dance around it.
Lex Fridman Podcast
#368 โ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization
No, I'm probably not doing a great job of explaining.
Lex Fridman Podcast
#368 โ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization
Which I can tell, because the Lex system didn't output like, ah, I understand.
Lex Fridman Podcast
#368 โ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization
So now I'm trying a different output to see if I can elicit the... Well, no, a different output.
Lex Fridman Podcast
#368 โ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization
I'm being trained to output things that make Lex think that he understood what I'm saying and agree with me.