Rob Wiblin

80,000 Hours Podcast

Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

The best internal Anthropic models scored near chance on this data set.

806.705 View full episode →

80,000 Hours Podcast

Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

And that's just distinguishing good and bad strategies that someone else came up with.

810.831 View full episode →

80,000 Hours Podcast

Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

Presumably that would be worse still at coming up with their own approaches from scratch.

814.035 View full episode →

80,000 Hours Podcast

Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

This combination of strengths and weaknesses is repeated elsewhere.

818.401 View full episode →

80,000 Hours Podcast

Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

Media staff couldn't be described as especially skeptical about AI capabilities in general, but this report definitely incidentally threw some cold water on the notion that AI agents are going to take all of our office jobs any minute now.

821.67 View full episode →

80,000 Hours Podcast

Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

META finds that while they can be extremely strong, where progress on a task is easy to measure accurately, they remain weak, sometimes surprisingly so, when it comes to judgment-heavy, messy, self-directed work.

833.032 View full episode →

80,000 Hours Podcast

Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

In one recent case, Opus 4.7 was given four days, internet access, and $5,000, and told to make as much money as possible.

843.848 View full episode →

80,000 Hours Podcast

Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

Across four runs, it made precisely $0.

850.498 View full episode →

80,000 Hours Podcast

Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

In an experiment run by another organization, Andon Labs, an AI was put in charge of a retail shop in San Francisco, where, among other things, it ordered 1,000 toilet seat covers for the employee bathroom and then listed them as merchandise, while scheduling blunders forced the shop to be shut entirely for three days.

853.282 View full episode →

80,000 Hours Podcast

Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

That is why the report's conclusion is a narrow one.

869.391 View full episode →

80,000 Hours Podcast

Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

A model might be able to start a small rogue deployment if it inherits enough permissions and finds a monitoring gap.

872.857 View full episode →

80,000 Hours Podcast

Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

But keeping that deployment hidden from an investigation, that's the kind of thing that requires careful, coordinated, long-horizon operational security.

878.323 View full episode →

80,000 Hours Podcast

Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

Exactly the kind of thing current agents are pretty damn bad at.

885.452 View full episode →

80,000 Hours Podcast

Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

The report observes, agents seemed much weaker in domains that made hill climbing difficult or risky, often making critical judgment errors that competent humans would have been unlikely to make.

888.916 View full episode →

80,000 Hours Podcast

Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

Hill climbing here means that you can make lots of small attempts, get fast feedback about whether each attempt worked, and keep improving from there.

898.928 View full episode →

80,000 Hours Podcast

Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

Current agents are much better when they can try something, get a clean signal about how it's going, and have another crack.

905.924 View full episode →

80,000 Hours Podcast

Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

Where hill climbing isn't possible, it's much harder to train them, and they perform far worse in deployment as well.

911.897 View full episode →

80,000 Hours Podcast

Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

So this is our key source of protection.

917.71 View full episode →

80,000 Hours Podcast

Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

How long can we hope that it will remain the case?

919.232 View full episode →

80,000 Hours Podcast

Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

Well, the companies really, really want to make their AIs better at forming high-level strategic plans, improving them, noticing where they're failing, adjusting, and so on.

921.375 View full episode →

Voice Profile Active

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment