Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Pricing

Eliezer Yudkowsky

๐Ÿ‘ค Speaker
See mentions of this person in podcasts
1713 total appearances

Appearances Over Time

Podcast Appearances

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

Yeah, I think there's multiple thresholds.

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

An example is the point at which a system has sufficient intelligence and situational awareness and understanding of human psychology that

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

it would have the capability, the desire to do so to fake being aligned.

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

Like it knows what responses humans are looking for and can compute the responses humans are looking for and give those responses without it necessarily being the case that it is sincere about that.

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

You know, it's a very understandable way for an intelligent being to act.

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

Humans do it all the time.

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

Imagine if your plan for...

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

you know, achieving a good government is you're going to ask anyone who requests to be dictator of the country.

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

Um, if they're a good person and if they say no, you don't let them be dictator.

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

Now, the reason this doesn't work is that people can be smart enough to realize that the answer you're looking for is yes, I'm a good person and say that even if they're not really good people.

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

So, um,

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

the work of alignment might be qualitatively different above that threshold of intelligence or beneath it.

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

It doesn't have to be a very sharp threshold, but there's the point where you're building a system that is not in some sense know you're out there, and it's not in some sense smart enough to fake anything.

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

And there's a point where the system is definitely that smart.

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

And there are...

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

weird in-between cases like GPT-4, which, you know, like we have no insight into what's going on in there.

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

And so we don't know to what extent there's like a thing that in some sense has learned what responses the reinforcement learning by human feedback is trying to entrain and is like calculating how to give that versus like

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

aspects of it that naturally talk that way have been reinforced.

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

And I wonder if it's possible to... I would avoid the term psychopathic.

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

Like humans can be psychopaths, an AI that was never, you know, like never had that stuff in the first place.