Menu
Sign In Search Podcasts Libraries Charts People & Topics Add Podcast API Blog Pricing

Scott Alexander

πŸ‘€ Speaker
4620 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model β€” Scott Alexander & Daniel Kokotajlo

I mean, literally, this happens in our scenario.

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model β€” Scott Alexander & Daniel Kokotajlo

This is, like, the, like...

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model β€” Scott Alexander & Daniel Kokotajlo

august 2027 alignment crisis where they like notice some warning signs like this uh in their like sort of hive mind right and um in the in the branch where they slow down and fix the issues then great they slowed down and fixed the issues and figured out what was going on but then in the other branch because of the race dynamics and because it's not like a super smoking gun they proceed with some sort of like shallow patch you know

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model β€” Scott Alexander & Daniel Kokotajlo

So I do expect there to be warning signs like that.

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model β€” Scott Alexander & Daniel Kokotajlo

And then if they do make those decisions in the race dynamics earlier on, then I think that when the systems are vastly super intelligent and they're even more powerful because they've been deployed halfway through the economy already and everyone's getting really scared by the news reports about the new Chinese killer drones or whatever the Chinese AIs are building on the side of the Pacific,

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model β€” Scott Alexander & Daniel Kokotajlo

I'm imagining basically just like similar things playing out so that even if there is some concerning evidence that someone finds where some of the superintelligence in some silo somewhere slipped up and did something that's like pretty suspicious, like, I don't know.

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model β€” Scott Alexander & Daniel Kokotajlo

I run a good Bing.

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model β€” Scott Alexander & Daniel Kokotajlo

Plus one to that, if I could just double-click on that.

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model β€” Scott Alexander & Daniel Kokotajlo

Go back to, like, 2015, and I think the way people typically thought, including myself, thought that we'd get to AGI would be kind of like the RL on video games thing that was happening.

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model β€” Scott Alexander & Daniel Kokotajlo

So imagine, like...

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model β€” Scott Alexander & Daniel Kokotajlo

instead of just training on StarCraft or Dota, you basically train on all the games in the Steam library.

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model β€” Scott Alexander & Daniel Kokotajlo

And then you get this awesome player of games AI that can just zero-shot crush a new game that it's never seen before.

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model β€” Scott Alexander & Daniel Kokotajlo

And then you take it into the real world and you start teaching it English and you start training it to do coding tasks for you and stuff like that.

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model β€” Scott Alexander & Daniel Kokotajlo

And if that had been the trajectory that we took to get to AI, summarizing the agency first and then world understanding trajectory...

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model β€” Scott Alexander & Daniel Kokotajlo

it would be quite terrifying because you'd have this like really powerful sort of like aggressive long horizon agent that wants to win.

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model β€” Scott Alexander & Daniel Kokotajlo

And then you're like trying to teach it English and get it to like do useful things for you.

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model β€” Scott Alexander & Daniel Kokotajlo

And it's just like so plausible that what's really going to happen is it's going to like learn to say whatever it needs to say in order to like make you give it the reward or whatever.

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model β€” Scott Alexander & Daniel Kokotajlo

And then we'll totally betray you later when it's all in charge, right?

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model β€” Scott Alexander & Daniel Kokotajlo

Yeah.

Dwarkesh Podcast
2027 Intelligence Explosion: Month-by-Month Model β€” Scott Alexander & Daniel Kokotajlo

But we didn't go that way.