Menu
Sign In Search Podcasts Libraries Charts People & Topics Add Podcast API Blog Pricing

Rob Wiblin

πŸ‘€ Speaker
3881 total appearances
Voice ID

Voice Profile Active

This person's voice can be automatically recognized across podcast episodes using AI voice matching.

Voice samples: 1
Confidence: Medium

Appearances Over Time

Podcast Appearances

80,000 Hours Podcast
What it's really like to run AGI safety at Google DeepMind (and where I disagree with 'doomers') | Rohin Shah

And I mean, I guess people would say they might prefer the second one because it's easier to grade or it's easier to see.

80,000 Hours Podcast
What it's really like to run AGI safety at Google DeepMind (and where I disagree with 'doomers') | Rohin Shah

And perhaps they don't feel in as good a position as you do to understand whether substantively any company is doing the right thing on the merits of what will matter in the long term.

80,000 Hours Podcast
What it's really like to run AGI safety at Google DeepMind (and where I disagree with 'doomers') | Rohin Shah

But I guess you're saying that's why you wanna have experts like AI LabWatch or whoever else who can have their eye on the ball, who can spend their time really thinking that through and actually grade it for everyone else so they can understand.

80,000 Hours Podcast
What it's really like to run AGI safety at Google DeepMind (and where I disagree with 'doomers') | Rohin Shah

What about an entirely different justification for having thorough detail in the model cards, which is the rest of the world needs to know for practical, you know, other researchers, you know, over in the Bay Area rather than in the UK, get actual research benefit out of understanding all of these things that GDM is doing and what the model looks like.

80,000 Hours Podcast
What it's really like to run AGI safety at Google DeepMind (and where I disagree with 'doomers') | Rohin Shah

That's going to help other companies do a better job of their own model cards and their own evals.

80,000 Hours Podcast
What it's really like to run AGI safety at Google DeepMind (and where I disagree with 'doomers') | Rohin Shah

What do you make of that?

80,000 Hours Podcast
What it's really like to run AGI safety at Google DeepMind (and where I disagree with 'doomers') | Rohin Shah

You told me that you think research papers that come out of GDM tend to fly under the radar a bit in general, I suppose, in part because GDM is based here in London rather than in the Bay Area, where I guess the greatest amount of socializing and energy around these issues exists.

80,000 Hours Podcast
What it's really like to run AGI safety at Google DeepMind (and where I disagree with 'doomers') | Rohin Shah

Yeah, tell me more about that.

80,000 Hours Podcast
What it's really like to run AGI safety at Google DeepMind (and where I disagree with 'doomers') | Rohin Shah

Okay, yeah, let's talk about one of the papers, one of the interesting research results that definitely flew under the radar as far as I could tell.

80,000 Hours Podcast
What it's really like to run AGI safety at Google DeepMind (and where I disagree with 'doomers') | Rohin Shah

First one is myopic optimization with non-myopic approval can mitigate multi-step reward hacking.

80,000 Hours Podcast
What it's really like to run AGI safety at Google DeepMind (and where I disagree with 'doomers') | Rohin Shah

I don't know how and if that didn't go viral.

80,000 Hours Podcast
What it's really like to run AGI safety at Google DeepMind (and where I disagree with 'doomers') | Rohin Shah

What did you accomplish with that work?

80,000 Hours Podcast
What it's really like to run AGI safety at Google DeepMind (and where I disagree with 'doomers') | Rohin Shah

Yeah, so I feel like I've heard this broad idea going back a very long time, which is that I guess if you're worried that a model might fake a company or cheat basically at accomplishing the ultimate goal because all it gets is a reward signal of whether it accomplished it or not.

80,000 Hours Podcast
What it's really like to run AGI safety at Google DeepMind (and where I disagree with 'doomers') | Rohin Shah

So imagine you've got the model running a business and you ask it to make money and it

80,000 Hours Podcast
What it's really like to run AGI safety at Google DeepMind (and where I disagree with 'doomers') | Rohin Shah

rather than like running a successful business, instead it goes and steals a bunch of money because it figures out that that's actually a more effective way of increasing its bank balance.

80,000 Hours Podcast
What it's really like to run AGI safety at Google DeepMind (and where I disagree with 'doomers') | Rohin Shah

Yep.

80,000 Hours Podcast
What it's really like to run AGI safety at Google DeepMind (and where I disagree with 'doomers') | Rohin Shah

That's a concern that you might have.

80,000 Hours Podcast
What it's really like to run AGI safety at Google DeepMind (and where I disagree with 'doomers') | Rohin Shah

One way that you could prevent that is that rather than reinforcing and evaluating the model based on the final outcome, instead you look at, you sample, I guess, from the actions that it's saying it's going to take or the actions that it did take, and then you grade them on whether they seemed reasonable and sensible to you or to some other monitor in light of the goal that you actually had.

80,000 Hours Podcast
What it's really like to run AGI safety at Google DeepMind (and where I disagree with 'doomers') | Rohin Shah

And then you wouldn't get this reward hacking behavior.

80,000 Hours Podcast
What it's really like to run AGI safety at Google DeepMind (and where I disagree with 'doomers') | Rohin Shah

So it's basically an instance