Rob Wiblin
π€ SpeakerVoice Profile Active
This person's voice can be automatically recognized across podcast episodes using AI voice matching.
Appearances Over Time
Podcast Appearances
What would something like that actually look like according to our best understanding of physics?
I recently came across a really interesting analysis of exactly this question from the AI researcher Baron Millich and I want to walk you through it because it's incredibly fun, remarkable how much we can likely predict this far in advance and as I'll explain it might even be decision relevant to humanity in the relatively near future.
If you spend any time thinking about the far future, and by far future I mean billions of years from now, one question that really matters is whether our universe mostly ends up peaceful or mostly ends up violent.
Does it settle into a patchwork of civilizations each quietly doing their own thing?
MARK MANDELMANN- Today, I'm speaking with Rohin Shah, who is head of AGI alignment and safety at Google DeepMind.
And so I suppose, Rohin, you've ended up, for better or worse, hopefully for better, being one of the more influential, dare I even say powerful people, to come out of the AGI alignment and safety ecosystem and school of thought.
And I guess you were generous enough to be super opinionated with me when you came on the show two years ago.
And I think judging by the notes that you've sent over this week, you're ready to be opinionated again.
Thanks so much for coming back on the show, Rohin.
That's how we like it.
If you were representing Google DeepMind, it might sound more like a press release.
So you were really very early in the scheme of things to whole misalignment, AI, AGI security issues.
I suppose you got involved in 2017.
So the first few percent, I suppose, are people who, I guess, started working on this professionally.
But despite that, you think that probably we're not going to get catastrophic misalignment, that our chances are really pretty good, and that probably prosaic, like ordinary alignment techniques, the kinds of things that Google DeepMind and other AI companies are doing, will probably succeed at preventing at least catastrophic misalignment.
Why do you think our chances are so good?
Yeah.
I mean, people have tried to put forward arguments for why this is likely or inevitable.
There's obviously the Yudkowsky-style argument, which I guess is focused on misgeneralization and adversarial examples.
I guess, yeah, there's the Ajay Khotra and Joe Carlsmith take, which I think I guess Carlsmith describes best in Is Power Seeking AI an Existential Risk?, which is more focused on, I guess, accidentally teaching AIs to deceive us by having, I guess, an unfortunately inaccurate feedback, I suppose.