Dwarkesh
๐ค SpeakerAppearances Over Time
Podcast Appearances
That's not the Terminator scenario.
That's just one of these natural consequences of how we train it.
And I think that once a thousand of these natural consequences of training add up, the AI is evil in the same way that like once the AI can do chess and philosophy and all these other things, eventually you got to admit it's intelligent.
Yeah.
So I think that each individual failure, like maybe it will make the national news.
Maybe people say, oh, it's so strange that GPT-7 did this particular thing and then they'll train it away and then it won't do that thing.
And there will be some point at the process of becoming super intelligent at which it
I don't want to say makes the last mistake because you'll probably have like gradually decreasing number of mistakes to some asymptote, but the last mistake that anyone worries about.
And after that, it will be able to do its own thing.
Yeah, I think the alignment community did not really expect LLMs.
I mean, if you look in Bostrom Superintelligence, there's a discussion of Oracle AIs, which are sort of like LLMs.
I think that came as a surprise.
I think one of the reasons I'm more hopeful than I used to be is that LLMs are great for
compared to the kind of reinforcement learning self-play agents that they expected.
I do think that now we are kind of starting to move away from the LLMs to those reinforcement learning agents.
We're going to face all of these problems again.
So...
I am the writer and the celebrity spokesperson for this scenario.
I am the only person on the team who is not a genius forecaster.
And maybe related to that, my PDoom is the lowest of anyone on the team.