Eliezer Yudkowsky

Lex Fridman Podcast

#368 – Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

They don't know about any of that.

6807.304 View full episode →

Lex Fridman Podcast

#368 – Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

They're looking at a design, and they don't see how the design outputs cold air.

6808.846 View full episode →

Lex Fridman Podcast

#368 – Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

It uses aspects of reality that they have not learned.

6813.113 View full episode →

Lex Fridman Podcast

#368 – Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

So magic in this sense is I can tell you exactly what I'm going to do, and even knowing exactly what I'm going to do, you can't see how I got the results that I got.

6816.317 View full episode →

Lex Fridman Podcast

#368 – Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

That's a really nice example.

6825.17 View full episode →

Lex Fridman Podcast

#368 – Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

Even now, GPT-4, is it lying to you?

6863.916 View full episode →

Lex Fridman Podcast

#368 – Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

Is it using an invalid argument?

6866.319 View full episode →

Lex Fridman Podcast

#368 – Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

Is it persuading you via the kind of process that could persuade you of false things as well as true things?

6868.682 View full episode →

Lex Fridman Podcast

#368 – Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

because the basic paradigm of machine learning that we are presently operating under is that you can have the loss function, but only for things you can evaluate.

6876.83 View full episode →

Lex Fridman Podcast

#368 – Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

If what you're evaluating is human thumbs up versus human thumbs down, you learn how to make the human press thumbs up.

6886.879 View full episode →

Lex Fridman Podcast

#368 – Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

That doesn't mean that you're making the human press thumbs up using the kind of rule that the human thinks is, human wants to be the case for what they press thumbs up on.

6893.746 View full episode →

Lex Fridman Podcast

#368 – Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

You know, maybe you're just learning to fool the human.

6904.015 View full episode →

Lex Fridman Podcast

#368 – Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

That's so fascinating and terrifying, the question of lying.

6907.137 View full episode →

Lex Fridman Podcast

#368 – Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

On the present paradigm, what you can verify is what you get more of.

6911.911 View full episode →

Lex Fridman Podcast

#368 – Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

If you can't verify it, you can't ask the AI for it.

6919.793 View full episode →

Lex Fridman Podcast

#368 – Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

because you can't train it to do things that you cannot verify.

6923.438 View full episode →

Lex Fridman Podcast

#368 – Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

Now, this is not an absolute law, but it's like the basic dilemma here.

6927.567 View full episode →

Lex Fridman Podcast

#368 – Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

Like maybe you can verify it for simple cases and then scale it up without retraining it somehow.

6932.798 View full episode →

Lex Fridman Podcast

#368 – Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

like by like chain of thought, by like making the chains of thought longer or something and like get more powerful stuff that you can't verify, but which is generalized from the simpler stuff that did verify.

6943.081 View full episode →

Lex Fridman Podcast

#368 – Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

And then the question is, did the alignment generalize along with the capabilities?

6954.974 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment