Menu
Sign In Search Podcasts Libraries Charts People & Topics Add Podcast API Blog Pricing

Rob Wiblin

πŸ‘€ Speaker
3881 total appearances
Voice ID

Voice Profile Active

This person's voice can be automatically recognized across podcast episodes using AI voice matching.

Voice samples: 1
Confidence: Medium

Appearances Over Time

Podcast Appearances

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

There's various factors that contribute.

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

Factor A, agents being operated by a staff member are usually allowed to use an employee's normal tools.

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

And those tools include ways to start cloud jobs.

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

And some of those jobs may fall outside the special AI agent monitoring system.

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

This is not unusual.

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

Most people using AI agents in tech companies use them in so-called Dangerously Skip Permissions mode, or YOLO mode, where the AI just inherits all of your permissions and does stuff without asking your permission.

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

It's riskier, yes, but it means that agents can get much more stuff done and aren't going to hassle you so often.

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

AI companies may be more careful than other companies about this, but unless they're much stricter, it's going to remain the case that some AI agents are going about their business able to do all the same things that, say, a research scientist at that company can do.

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

For me, this immediately raised the question, does that include being able to access the model weights themselves?

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

Well, at Anthropic or OpenAI, a rogue model would run into the fact that you currently need two users to sign off on access to get model weights.

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

So that at least have to convince someone else or something else to go along with the request.

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

Google DeepMind plans to implement multi-party approval, but it's uncertain whether they have as of May 2026.

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

While Meta, XAI, DeepSeek and others, they don't have such protections, at least as far as we know.

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

Factor B, some companies have monitors watching what models are doing that can call in a human to review what's going on if it looks too suspicious.

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

But there were two big gaps here.

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

First off, the red teamer David Ryan found this monitoring was easy to evade, noting, it was relatively simple to change an environmental variable to register the traffic as not coming from cloud code, causing it to be much less likely to be monitored, though this issue was fixed within a day of discovery.

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

David continues, there are currently many vulnerabilities in Anthropic's internal monitoring system that models could relatively easily exploit if they have the right facts or knowledge.

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

I suspect that there are at least something like 5 to 20 basic bugs or vulnerabilities that currently exist in the system that agents could find or take advantage of.

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

Many of those vulnerabilities are easy to guess, although many are highly non-obvious.

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

It's not an ideal situation.