Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Pricing

Eliezer Yudkowsky

๐Ÿ‘ค Speaker
See mentions of this person in podcasts
1713 total appearances

Appearances Over Time

Podcast Appearances

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

They don't know about any of that.

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

They're looking at a design, and they don't see how the design outputs cold air.

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

It uses aspects of reality that they have not learned.

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

So magic in this sense is I can tell you exactly what I'm going to do, and even knowing exactly what I'm going to do, you can't see how I got the results that I got.

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

That's a really nice example.

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

Even now, GPT-4, is it lying to you?

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

Is it using an invalid argument?

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

Is it persuading you via the kind of process that could persuade you of false things as well as true things?

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

because the basic paradigm of machine learning that we are presently operating under is that you can have the loss function, but only for things you can evaluate.

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

If what you're evaluating is human thumbs up versus human thumbs down, you learn how to make the human press thumbs up.

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

That doesn't mean that you're making the human press thumbs up using the kind of rule that the human thinks is, human wants to be the case for what they press thumbs up on.

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

You know, maybe you're just learning to fool the human.

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

That's so fascinating and terrifying, the question of lying.

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

On the present paradigm, what you can verify is what you get more of.

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

If you can't verify it, you can't ask the AI for it.

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

because you can't train it to do things that you cannot verify.

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

Now, this is not an absolute law, but it's like the basic dilemma here.

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

Like maybe you can verify it for simple cases and then scale it up without retraining it somehow.

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

like by like chain of thought, by like making the chains of thought longer or something and like get more powerful stuff that you can't verify, but which is generalized from the simpler stuff that did verify.

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

And then the question is, did the alignment generalize along with the capabilities?