Eliezer Yudkowsky

#368 – Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

It meant years ago, about 20 years, 15 years, something like that, I was talking to a congressperson.

#368 – Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

who had become alarmed about the eventual prospects, and he wanted work on building AIs without emotions, because the emotional AIs were the scary ones, you see.

7960.238 View full episode →

Lex Fridman Podcast

#368 – Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

And some poor person at ARPA had come up with a research proposal whereby this congressman's panic and desire to fund this thing would...

7974.639 View full episode →

Lex Fridman Podcast

#368 – Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

go into something that the person at ARPA thought would be useful and had been munched around to where it would sound like the congressman-like work was happening on this, which, you know, of course, the congressperson had misunderstood the problem and did not understand where the danger came from.

7984.994 View full episode →

Lex Fridman Podcast

#368 – Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

And...

8000.772 View full episode →

Lex Fridman Podcast

#368 – Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

The issue is that you could do this in a certain precise way and maybe get something.

8004.923 View full episode →

Lex Fridman Podcast

#368 – Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

When I say put up prizes on interpretability, I'm like, well...

8012.166 View full episode →

Lex Fridman Podcast

#368 – Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

because it's verifiable there as opposed to other places, you can tell whether or not good work actually happened.

8019.115 View full episode →

Lex Fridman Podcast

#368 – Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

In this exact narrow case, if you do things in exactly the right way, you can maybe throw money at it and produce science instead of anti-science and nonsense.

8026.132 View full episode →

Lex Fridman Podcast

#368 – Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

And all the methods that I know of trying to throw money at this problem share this property of, well, if you do it exactly right, based on understanding exactly what tends to produce useful outputs or not, then you can add money to it in this way.

8036.598 View full episode →

Lex Fridman Podcast

#368 – Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

And the thing that I'm giving as an example here in front of this large audience is the most understandable of those.

8052.547 View full episode →

Lex Fridman Podcast

#368 – Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

Because there's other people like Chris Ola, and even more generally, you can tell whether or not interpretability progress has occurred.

8061.218 View full episode →

Lex Fridman Podcast

#368 – Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

So if I say throw money at producing more interpretability, there's a chance somebody can do it that way, and it will actually produce useful results.

8072.598 View full episode →

Lex Fridman Podcast

#368 – Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

Then the other stuff just blurs off into being harder to target exactly than that.

8080.593 View full episode →

Lex Fridman Podcast

#368 – Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

It looks like we took a much smaller...

8098.702 View full episode →

Lex Fridman Podcast

#368 – Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

set of transformer layers than the ones in the modern, bleeding-edge, state-of-the-art systems.

8105.077 View full episode →

Lex Fridman Podcast

#368 – Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

And after applying various tools and mathematical ideas and trying 20 different things, we have shown that this piece of the system is doing this kind of useful work.

8112.506 View full episode →

Lex Fridman Podcast

#368 – Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

You can hope.

8137.931 View full episode →

Lex Fridman Podcast

#368 – Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

And it's probably true.

8139.013 View full episode →

Lex Fridman Podcast

#368 – Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization

Like you would not expect the smaller tricks to go away.

8140.014 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment