Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Pricing

Eliezer Yudkowsky

๐Ÿ‘ค Speaker
See mentions of this person in podcasts
1713 total appearances

Appearances Over Time

Podcast Appearances

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

And then they could just exploit any hole.

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

Yep.

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

So it could be that the critical moment is not when is it smart enough that everybody's about to fall over dead, but rather when is it smart enough that it can get onto a less controlled GPU cluster with it

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

faking the books on what's actually running on that GPU cluster and start improving itself without humans watching it.

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

And then it gets smart enough to kill everyone from there, but it wasn't smart enough to kill everyone at the critical moment when you like screwed up, when you needed to have done better by that point or everybody dies.

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

So the problem is that what you can learn on the weak systems may not generalize to the very strong systems because the strong systems are going to be different in important ways.

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

Chris Ulla's team is...

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

has been working on mechanistic interpretability, understanding what is going on inside the giant inscrutable matrices of floating point numbers by taking a telescope to them and figuring out what is going on in there.

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

Have they made progress?

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

Yes.

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

Have they made enough progress?

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

Well, you can try to quantify this in different ways.

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

One of the ways I've tried to quantify it is by putting up a prediction market on whether in 2026, we will have understood anything that goes on inside a giant transformer net that

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

was not known to us in 2006.

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

Like, we have now understood induction heads in these systems by dint of much research and great sweat and triumph, which is like a thing where if you go like AB, AB, AB, it'll be like, oh, I bet that continues AB.

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

And a bit more complicated than that.

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

But the point is, like,

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

We knew about regular expressions in 2006, and these are pretty simple as regular expressions go.

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

So this is a case where, by dint of great sweat, we understood what is going on inside a transformer, but it's not the thing that makes transformers smart.

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

It's a kind of thing that we could have built by hand decades earlier.