Rob Wiblin

What it's really like to run AGI safety at Google DeepMind (and where I disagree with 'doomers') | Rohin Shah

And I mean, I guess people would say they might prefer the second one because it's easier to grade or it's easier to see.

5076.25 View full episode →

80,000 Hours Podcast

What it's really like to run AGI safety at Google DeepMind (and where I disagree with 'doomers') | Rohin Shah

And perhaps they don't feel in as good a position as you do to understand whether substantively any company is doing the right thing on the merits of what will matter in the long term.

5083.717 View full episode →

80,000 Hours Podcast

What it's really like to run AGI safety at Google DeepMind (and where I disagree with 'doomers') | Rohin Shah

But I guess you're saying that's why you wanna have experts like AI LabWatch or whoever else who can have their eye on the ball, who can spend their time really thinking that through and actually grade it for everyone else so they can understand.

5093.287 View full episode →

80,000 Hours Podcast

What it's really like to run AGI safety at Google DeepMind (and where I disagree with 'doomers') | Rohin Shah

What about an entirely different justification for having thorough detail in the model cards, which is the rest of the world needs to know for practical, you know, other researchers, you know, over in the Bay Area rather than in the UK, get actual research benefit out of understanding all of these things that GDM is doing and what the model looks like.

5232.715 View full episode →

80,000 Hours Podcast

What it's really like to run AGI safety at Google DeepMind (and where I disagree with 'doomers') | Rohin Shah

That's going to help other companies do a better job of their own model cards and their own evals.

5249.575 View full episode →

80,000 Hours Podcast

What it's really like to run AGI safety at Google DeepMind (and where I disagree with 'doomers') | Rohin Shah

What do you make of that?

5252.839 View full episode →

80,000 Hours Podcast

What it's really like to run AGI safety at Google DeepMind (and where I disagree with 'doomers') | Rohin Shah

You told me that you think research papers that come out of GDM tend to fly under the radar a bit in general, I suppose, in part because GDM is based here in London rather than in the Bay Area, where I guess the greatest amount of socializing and energy around these issues exists.

5339.264 View full episode →

80,000 Hours Podcast

What it's really like to run AGI safety at Google DeepMind (and where I disagree with 'doomers') | Rohin Shah

Yeah, tell me more about that.

5354.479 View full episode →

80,000 Hours Podcast

What it's really like to run AGI safety at Google DeepMind (and where I disagree with 'doomers') | Rohin Shah

Okay, yeah, let's talk about one of the papers, one of the interesting research results that definitely flew under the radar as far as I could tell.

5387.678 View full episode →

80,000 Hours Podcast

What it's really like to run AGI safety at Google DeepMind (and where I disagree with 'doomers') | Rohin Shah

First one is myopic optimization with non-myopic approval can mitigate multi-step reward hacking.

5394.351 View full episode →

80,000 Hours Podcast

What it's really like to run AGI safety at Google DeepMind (and where I disagree with 'doomers') | Rohin Shah

I don't know how and if that didn't go viral.

5399.902 View full episode →

80,000 Hours Podcast

What it's really like to run AGI safety at Google DeepMind (and where I disagree with 'doomers') | Rohin Shah

What did you accomplish with that work?

5402.788 View full episode →

80,000 Hours Podcast

What it's really like to run AGI safety at Google DeepMind (and where I disagree with 'doomers') | Rohin Shah

Yeah, so I feel like I've heard this broad idea going back a very long time, which is that I guess if you're worried that a model might fake a company or cheat basically at accomplishing the ultimate goal because all it gets is a reward signal of whether it accomplished it or not.

5577.438 View full episode →

80,000 Hours Podcast

What it's really like to run AGI safety at Google DeepMind (and where I disagree with 'doomers') | Rohin Shah

So imagine you've got the model running a business and you ask it to make money and it

5591.564 View full episode →

80,000 Hours Podcast

What it's really like to run AGI safety at Google DeepMind (and where I disagree with 'doomers') | Rohin Shah

rather than like running a successful business, instead it goes and steals a bunch of money because it figures out that that's actually a more effective way of increasing its bank balance.

5597.114 View full episode →

80,000 Hours Podcast

What it's really like to run AGI safety at Google DeepMind (and where I disagree with 'doomers') | Rohin Shah

Yep.

5603.94 View full episode →

80,000 Hours Podcast

What it's really like to run AGI safety at Google DeepMind (and where I disagree with 'doomers') | Rohin Shah

That's a concern that you might have.

5604.501 View full episode →

80,000 Hours Podcast

What it's really like to run AGI safety at Google DeepMind (and where I disagree with 'doomers') | Rohin Shah

One way that you could prevent that is that rather than reinforcing and evaluating the model based on the final outcome, instead you look at, you sample, I guess, from the actions that it's saying it's going to take or the actions that it did take, and then you grade them on whether they seemed reasonable and sensible to you or to some other monitor in light of the goal that you actually had.

5606.082 View full episode →

80,000 Hours Podcast

What it's really like to run AGI safety at Google DeepMind (and where I disagree with 'doomers') | Rohin Shah

And then you wouldn't get this reward hacking behavior.

5624.078 View full episode →

80,000 Hours Podcast

What it's really like to run AGI safety at Google DeepMind (and where I disagree with 'doomers') | Rohin Shah

So it's basically an instance

5626.28 View full episode →

Voice Profile Active

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment