Menu
Sign In Search Podcasts Libraries Charts People & Topics Add Podcast API Blog Pricing

Rob Wiblin

πŸ‘€ Speaker
3881 total appearances
Voice ID

Voice Profile Active

This person's voice can be automatically recognized across podcast episodes using AI voice matching.

Voice samples: 1
Confidence: Medium

Appearances Over Time

Podcast Appearances

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

You'll recall the impressive cyber capabilities of Mythos Preview that I mentioned a month or two ago and everyone's been talking about.

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

Its ability to find new vulnerabilities in operating systems, write exploits, progressively break into computer networks and escape the sandboxes Anthropic tried to confine it to.

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

META doesn't actually say if Mythos Preview was one of the models they tested, but that's the kind of level models are operating at now.

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

MITA also found that the most capable models they looked at for this research were able to successfully complete almost all of the software engineering tasks that they've come up with, even ones that would take a human professional 10 or 20 hours.

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

So it's not crazy to guess that an AI that starts out already inside a company's computer networks would have an easy time spreading, gaining access to more and more things, and keeping a low profile.

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

But MITA argues that would be quite mistaken.

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

The trouble is this.

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

to spread from computer to computer and avoid shutdown for weeks.

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

A rogue agent needs to get a long chain of things right without a single catastrophic mistake.

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

It needs to understand what's monitoring it, avoid writing incriminating thoughts in its scratch pad, keep credentials alive, create backup instances, and communicate with them without leaving suspicious traces.

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

And it wouldn't just be dodging one watchdog.

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

It would be dodging cloud logs, billing records, and network monitoring.

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

And if anyone noticed something odd, it would then need to survive an investigation where transcripts, shell histories, cloud records, credentials, etc., they all have to tell some consistent story.

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

And one loose thread could bring the whole thing crashing down.

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

As long as the company we're talking about cares about this and is on the ball, all of that is absolutely no simple task.

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

And according to the report, when trying to do this kind of thing, agents made obvious mistakes.

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

For instance, encoding suspicious commands in ways that made them stand out more

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

generating justifications that didn't hold together, or verifying their own sabotage in ways that drew monitor attention.

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

They also found that they chose bad high-level strategies.

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

Mita writes, our embedded red teamer built a data set of 16 effective and 16 ineffective attack strategies for subverting Anthropic's monitoring system.