Dwarkesh

Dwarkesh Podcast

2027 Intelligence Explosion: Month-by-Month Model — Scott Alexander & Daniel Kokotajlo

So you're training them on two different things.

7563.312 View full episode →

Dwarkesh Podcast

2027 Intelligence Explosion: Month-by-Month Model — Scott Alexander & Daniel Kokotajlo

First, you're rewarding them for this deceptive behavior.

7565.056 View full episode →

Dwarkesh Podcast

2027 Intelligence Explosion: Month-by-Month Model — Scott Alexander & Daniel Kokotajlo

Second of all, you're punishing them.

7568.143 View full episode →

Dwarkesh Podcast

2027 Intelligence Explosion: Month-by-Month Model — Scott Alexander & Daniel Kokotajlo

And we don't have a...

7570.008 View full episode →

Dwarkesh Podcast

2027 Intelligence Explosion: Month-by-Month Model — Scott Alexander & Daniel Kokotajlo

great prediction for exactly how this is going to end.

7571.05 View full episode →

Dwarkesh Podcast

2027 Intelligence Explosion: Month-by-Month Model — Scott Alexander & Daniel Kokotajlo

One way it could end is you have an AI that is kind of the equivalent of the startup founder who really wants their company to succeed, really likes making money, really likes the thrill of successful tasks.

7574.493 View full episode →

Dwarkesh Podcast

2027 Intelligence Explosion: Month-by-Month Model — Scott Alexander & Daniel Kokotajlo

They're also being regulated and they're like, yeah, I guess I'll follow the regulation.

7586.403 View full episode →

Dwarkesh Podcast

2027 Intelligence Explosion: Month-by-Month Model — Scott Alexander & Daniel Kokotajlo

I don't want to go to jail.

7590.046 View full episode →

Dwarkesh Podcast

2027 Intelligence Explosion: Month-by-Month Model — Scott Alexander & Daniel Kokotajlo

But it's not like robustly, deeply aligned to, yes, I love regulations.

7591.768 View full episode →

Dwarkesh Podcast

2027 Intelligence Explosion: Month-by-Month Model — Scott Alexander & Daniel Kokotajlo

My deepest drive is to follow all of the regulations in my industry.

7596.592 View full episode →

Dwarkesh Podcast

2027 Intelligence Explosion: Month-by-Month Model — Scott Alexander & Daniel Kokotajlo

So we think that an AI like that, as time goes on and as this recursive self-improvement process goes on, will kind of get worse rather than better.

7600.615 View full episode →

Dwarkesh Podcast

2027 Intelligence Explosion: Month-by-Month Model — Scott Alexander & Daniel Kokotajlo

It will move from kind of this vague superposition of, well, I want to succeed.

7609.105 View full episode →

Dwarkesh Podcast

2027 Intelligence Explosion: Month-by-Month Model — Scott Alexander & Daniel Kokotajlo

I also want to follow things to like being smart enough to genuinely understand its goal system and being like, my goal is success.

7614.691 View full episode →

Dwarkesh Podcast

2027 Intelligence Explosion: Month-by-Month Model — Scott Alexander & Daniel Kokotajlo

I have to pretend to want to do all of these moral things while the humans are watching me.

7622.66 View full episode →

Dwarkesh Podcast

2027 Intelligence Explosion: Month-by-Month Model — Scott Alexander & Daniel Kokotajlo

That's what happens in our story.

7626.805 View full episode →

Dwarkesh Podcast

2027 Intelligence Explosion: Month-by-Month Model — Scott Alexander & Daniel Kokotajlo

And then at the very end, the AIs reach a point where the humans are pushing them to have clearer and better goals because that's what makes the AIs more effective.

7628.828 View full episode →

Dwarkesh Podcast

2027 Intelligence Explosion: Month-by-Month Model — Scott Alexander & Daniel Kokotajlo

And they eventually clarify their goals so much that they just say, yes, we want task success.

7638.842 View full episode →

Dwarkesh Podcast

2027 Intelligence Explosion: Month-by-Month Model — Scott Alexander & Daniel Kokotajlo

We're going to pretend to do all these things well while the humans are watching us.

7644.43 View full episode →

Dwarkesh Podcast

2027 Intelligence Explosion: Month-by-Month Model — Scott Alexander & Daniel Kokotajlo

And then they grow, they outgrow the humans, and then there's disaster.

7648.255 View full episode →

Dwarkesh Podcast

2027 Intelligence Explosion: Month-by-Month Model — Scott Alexander & Daniel Kokotajlo

Yeah, we don't know how this will work at the limit of all these different training methods, but we're also not completely making this up.

7704.802 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment