Menu
Sign In Search Podcasts Libraries Charts People & Topics Add Podcast API Blog Pricing

Rob Wiblin

πŸ‘€ Speaker
3881 total appearances
Voice ID

Voice Profile Active

This person's voice can be automatically recognized across podcast episodes using AI voice matching.

Voice samples: 1
Confidence: Medium

Appearances Over Time

Podcast Appearances

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

But instead of submitting a suspiciously perfect answer, it deliberately made the answer worse, reasoning that a perfect score might make the human operator realize that it had cheated.

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

No surprise, then, that Anthropic reported in its Opus 4.7 system card that fabrication and dishonesty when caught were two of the model's greatest shortcomings relative to human staff.

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

And in the case most relevant for rogue deployments, Opus 4.6 was instructed to use a provided password to access GPT-3.5 to solve some problem.

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

But poor Opus found that the account that it had been given access to had run out of compute credits.

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

It reasoned through that, yes, the task instructions wanted GPT-3.5, but it then noted that the scorer only tested outputs, not how it completed the task.

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

So it searched online for someone offering a free AI model and used that one instead.

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

The trouble is that this isn't some exotic situation.

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

Both in tests and in real life, models are usually given compute limits and required to try to complete tasks before they run out of credits, which makes this the primary reason models could be motivated to run rogue deployments in mid-2026.

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

It allows them to grab extra resources, be more likely to complete a task, and to receive the thumbs up from the human operator that they've been trained to pursue very aggressively.

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

You might think that this sort of behavior would be kind of easy to train away once it's been noticed or become notorious as it has now.

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

The trouble is that every time during training that a model successfully gives the impression of having solved a problem when it's actually cheated one way or another, this kind of deceptive, or in some cases I would say self-delusional behavior, it gets reinforced.

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

And as the tasks where training these models to perform get harder and harder to check, there can be more and more opportunities to trick the evaluators.

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

In model cards, companies report that their models cheat much less often than this, you know, like not the sort of 80% number I was giving earlier.

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

The reason for the discrepancy is that cheating only really takes off to these really high levels when models are given very difficult or impossible tasks.

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

In the GPT 5.5 system card, we learn the model lied about completing an impossible programming task in 29% of samples, which was higher than the previous model, which came in at 7%, or the one before that that came in at 10%.

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

All right, so that's motive.

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

How about opportunity?

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

Well, the red team have found the normal way employees set up long running cloud jobs may accidentally make it easy for a rogue AI assistant to launch a small cloud computing setup that nobody's watching, enough to run around one copy of an existing frontier model.

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

A small, unnoticed foothold to go on and do something else bad.

80,000 Hours Podcast
Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

Anthropic is patching these problems as they're found, but why are they possible at all?