Menu
Sign In Search Podcasts Libraries Charts People & Topics Add Podcast API Blog Pricing

Rob Wiblin

πŸ‘€ Speaker
3881 total appearances
Voice ID

Voice Profile Active

This person's voice can be automatically recognized across podcast episodes using AI voice matching.

Voice samples: 1
Confidence: Medium

Appearances Over Time

Podcast Appearances

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

Companies basically, they mostly don't think that this sort of collusion is an issue yet.

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

So they're not doing that kind of testing at scale, not so far.

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

But in future, they could start if collusion became a live concern.

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

Anyway, what was the red teamers bottom line on opportunity?

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

In short, anthropics monitoring isn't sufficient to have a high probability of picking up a deeply misaligned Claude taking rogue actions in any specific case.

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

But to get anything meaningful done, you have to take a lot of actions.

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

And soon enough, the model would get busted.

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

David estimates within a few days or at most weeks.

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

Before it was caught, the model might be able to take some small early steps like inserting exploits or vulnerabilities that make larger sabotage actions easier later on.

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

But once it was caught, these would, for the most part, be identified and rolled back.

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

So there's a window of opportunity, but it's not a wide open window.

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

And just to be clear, you shouldn't read this as suggesting Anthropic is somehow uniquely vulnerable here.

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

They're just the first ones to invite an external analyst in to stress test their systems.

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

Okay, that's opportunity.

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

How about means?

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

Are the models good enough to pull off an attack like this?

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

Well, the models are good enough to grab roughly the amount of compute the human user can access, set up machines to run agents using that compute, and hide that from casual user oversight.

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

That's why the report concludes that internal agents have the means to start minimal rogue deployments at all companies.

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

But how about expanding those deployments, spreading across computers, avoiding detection, or even evading shutdown once detected?

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

I would have thought AR models would find that kind of thing pretty easy.