Eliezer Yudkowsky

#368 – Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

Yeah, I think there's multiple thresholds.

#368 – Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

An example is the point at which a system has sufficient intelligence and situational awareness and understanding of human psychology that

3826.25 View full episode →

Lex Fridman Podcast

#368 – Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

it would have the capability, the desire to do so to fake being aligned.

3837.845 View full episode →

Lex Fridman Podcast

#368 – Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

Like it knows what responses humans are looking for and can compute the responses humans are looking for and give those responses without it necessarily being the case that it is sincere about that.

3842.632 View full episode →

Lex Fridman Podcast

#368 – Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

You know, it's a very understandable way for an intelligent being to act.

3854.027 View full episode →

Lex Fridman Podcast

#368 – Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

Humans do it all the time.

3859.214 View full episode →

Lex Fridman Podcast

#368 – Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

Imagine if your plan for...

3860.897 View full episode →

Lex Fridman Podcast

#368 – Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

you know, achieving a good government is you're going to ask anyone who requests to be dictator of the country.

3865.452 View full episode →

Lex Fridman Podcast

#368 – Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

Um, if they're a good person and if they say no, you don't let them be dictator.

3874.241 View full episode →

Lex Fridman Podcast

#368 – Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

Now, the reason this doesn't work is that people can be smart enough to realize that the answer you're looking for is yes, I'm a good person and say that even if they're not really good people.

3879.987 View full episode →

Lex Fridman Podcast

#368 – Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

So, um,

3892.039 View full episode →

Lex Fridman Podcast

#368 – Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

the work of alignment might be qualitatively different above that threshold of intelligence or beneath it.

3893.397 View full episode →

Lex Fridman Podcast

#368 – Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

It doesn't have to be a very sharp threshold, but there's the point where you're building a system that is not in some sense know you're out there, and it's not in some sense smart enough to fake anything.

3901.67 View full episode →

Lex Fridman Podcast

#368 – Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

And there's a point where the system is definitely that smart.

3916.294 View full episode →

Lex Fridman Podcast

#368 – Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

And there are...

3919.339 View full episode →

Lex Fridman Podcast

#368 – Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

weird in-between cases like GPT-4, which, you know, like we have no insight into what's going on in there.

3920.802 View full episode →

Lex Fridman Podcast

#368 – Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

And so we don't know to what extent there's like a thing that in some sense has learned what responses the reinforcement learning by human feedback is trying to entrain and is like calculating how to give that versus like

3931.154 View full episode →

Lex Fridman Podcast

#368 – Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

aspects of it that naturally talk that way have been reinforced.

3949.152 View full episode →

Lex Fridman Podcast

#368 – Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

And I wonder if it's possible to... I would avoid the term psychopathic.

3985.74 View full episode →

Lex Fridman Podcast

#368 – Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

Like humans can be psychopaths, an AI that was never, you know, like never had that stuff in the first place.

3988.824 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment