Scott Alexander
π€ SpeakerAppearances Over Time
Podcast Appearances
And you're also slapping on some sort of alignment training as well.
We don't know what actual goals will end up inside the AIs and what the sort of internal structure of that will be like, what goals will be instrumental versus terminal.
We have a couple different hypotheses and we like picked one for purposes of telling the story.
I'm happy to go into more detail if you want about like the mechanistic details of the particular hypothesis we picked or like the different alternative hypotheses that we didn't depict in the story that like also seem plausible to us.
Things like this do happen pretty frequently, so...
OpenAI just also had a paper about the hacking stuff where it's literally in the chain of thought, like, let's hack, you know?
And also, anecdotally, me and a bunch of friends have found that the models often seem to just double down on their BS.
There's a mounting pile of evidence that at least some of the time they are just actually lying.
They know that what they're doing is not what you wanted and they're doing it anyway.
I think there's a mounting pile of evidence that that does happen.
I think I'd also mention the homogeneity point.
Like, you know, any group of humans, even if they're all, like, exact same race and gender, is, like, going to be much more diverse than the army of AIs on the data center because they'll be mostly, like, literal copies of each other, you know?
And I think that goes for a lot.
Another thing I was going to mention is that, like, and our scenario doesn't really explore this.
I think in our scenario they're more of, like, a monolith, but...
Historically, a lot of crazy conquests happened from groups that were not at all monoliths.
And I've been heavily influenced by reading the history of the conquistadors, which you may know about.
But did you know that when Cortes took over Mexico, he had to pause halfway through, go back to the coast, and fight off a larger Spanish expedition that was sent to arrest him?
So, like, the Spanish were fighting each other in the middle of the conquest of Mexico.
Similarly, in the conquest of Peru, Pizarro was replicating Cortes' strategy, which, by the way, was go get a meeting with the emperor and then kidnap the emperor and force him at sword point to...