Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Eliezer Yudkowsky

๐Ÿ‘ค Speaker
See mentions of this person in podcasts
1716 total appearances

Appearances Over Time

Podcast Appearances

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

When you have a system that's like doing larger kinds of work, you would expect the larger work kinds of work to be building on top of the smaller kinds of work and gradient descent runs across the smaller kinds of work before it runs across the larger kinds of work.

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

And yeah,

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

Also, it's not enough.

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

So in particular, let's say you have got your interpretability tools and they say that your current AI system is plotting to kill you.

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

Now what?

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

I'm waiting to kill you.

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

When you optimize against visible misalignment, you are optimizing against misalignment and you are also optimizing against visibility.

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

So sure, you can.

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

It's true.

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

All you're doing is removing the obvious intentions to kill you.

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

You've got your detector.

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

It's showing something inside the system that you don't like.

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

Okay, say the disaster monkey is running this thing.

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

will optimize the system until the visible bad behavior goes away.

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

But it's arising for fundamental reasons of instrumental convergence, the old you can't bring the coffee if you're dead, any goal, almost every set of utility functions with a few narrow exceptions implies killing all the humans.

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

I can tell it to you right now is that it wants to do something.

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

And the way to get the most of that thing is to put the universe into a state where there aren't humans.

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

That'd be nice, assuming that you got doesn't want to kill sufficiently exactly right that it didn't be like, oh, I will like detach their heads and put them in some jars and keep the heads alive forever and then go do the thing.

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

But leaving that aside, well, not leaving that aside.

Lex Fridman Podcast
#368 โ€“ Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

Because there is a whole issue where as something gets smarter, it finds ways of achieving the same goal predicate that were not imaginable to stupider versions of the system or perhaps the stupider operators.