Scott Alexander
π€ SpeakerAppearances Over Time
Podcast Appearances
I mean, literally, this happens in our scenario.
This is, like, the, like...
august 2027 alignment crisis where they like notice some warning signs like this uh in their like sort of hive mind right and um in the in the branch where they slow down and fix the issues then great they slowed down and fixed the issues and figured out what was going on but then in the other branch because of the race dynamics and because it's not like a super smoking gun they proceed with some sort of like shallow patch you know
So I do expect there to be warning signs like that.
And then if they do make those decisions in the race dynamics earlier on, then I think that when the systems are vastly super intelligent and they're even more powerful because they've been deployed halfway through the economy already and everyone's getting really scared by the news reports about the new Chinese killer drones or whatever the Chinese AIs are building on the side of the Pacific,
I'm imagining basically just like similar things playing out so that even if there is some concerning evidence that someone finds where some of the superintelligence in some silo somewhere slipped up and did something that's like pretty suspicious, like, I don't know.
I run a good Bing.
Plus one to that, if I could just double-click on that.
Go back to, like, 2015, and I think the way people typically thought, including myself, thought that we'd get to AGI would be kind of like the RL on video games thing that was happening.
So imagine, like...
instead of just training on StarCraft or Dota, you basically train on all the games in the Steam library.
And then you get this awesome player of games AI that can just zero-shot crush a new game that it's never seen before.
And then you take it into the real world and you start teaching it English and you start training it to do coding tasks for you and stuff like that.
And if that had been the trajectory that we took to get to AI, summarizing the agency first and then world understanding trajectory...
it would be quite terrifying because you'd have this like really powerful sort of like aggressive long horizon agent that wants to win.
And then you're like trying to teach it English and get it to like do useful things for you.
And it's just like so plausible that what's really going to happen is it's going to like learn to say whatever it needs to say in order to like make you give it the reward or whatever.
And then we'll totally betray you later when it's all in charge, right?
Yeah.
But we didn't go that way.