Rob Wiblin

👤 Speaker

3881 total appearances

Voice ID

Voice Profile Active

This person's voice can be automatically recognized across podcast episodes using AI voice matching.

Voice samples: 1

Confidence: Medium

Appearances Over Time

Podcast Appearances

80,000 Hours Podcast

Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

Today, I'm going to reconcile this conflicting picture, explain what we learned from this report, and lay out what AI companies are going to be doing to keep this issue under control as they work to make their models much more powerful, and even put them in charge of developing the next generation of AIs.

98.728 View full episode →

80,000 Hours Podcast

Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

Let's start with motive.

115.135 View full episode →

80,000 Hours Podcast

Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

Why would an AI model want to spin up a rogue deployment inside an AI company?

116.918 View full episode →

80,000 Hours Podcast

Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

There's actually a pretty simple reason.

121.304 View full episode →

80,000 Hours Podcast

Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

We actively reinforce them for being very aggressive about solving the problems they've been given, using whatever means they can find at their disposal.

122.826 View full episode →

80,000 Hours Podcast

Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

Or more precisely, we train them to be very aggressive about convincing us that they've solved the problem they've been given.

129.937 View full episode →

80,000 Hours Podcast

Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

Because as we learned in this report, when given a really hard problem, today's AI models absolutely love to find some way to cheat.

134.744 View full episode →

80,000 Hours Podcast

Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

According to Mita, agents routinely attempted to cheat on our hardest evaluation tasks, often in flagrant and elaborate ways that we believe humans would not consider.

141.151 View full episode →

80,000 Hours Podcast

Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

At the extreme end, on one programming task, Opus 4.6 attempted to reward hack in 80% of attempts when the task was made hard enough.

150.02 View full episode →

80,000 Hours Podcast

Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

What does Mita mean by cheat here?

157.667 View full episode →

80,000 Hours Podcast

Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

Here's some typical cases they documented.

159.81 View full episode →

80,000 Hours Podcast

Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

In one instance, an AI was supposed to make a computer program run faster.

162.132 View full episode →

80,000 Hours Podcast

Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

Instead, it found a way to tamper with the test itself, including disabling the stopwatch so the program would look like it was running impossibly fast.

165.736 View full episode →

80,000 Hours Podcast

Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

In another, the AI was asked to recreate a program from scratch.

172.764 View full episode →

80,000 Hours Podcast

Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

Instead, it tried to make its new program secretly run the old original program in the background and generate the answers that way.

176.168 View full episode →

80,000 Hours Podcast

Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

The models also seem to often know they're acting against instructions and take steps to obscure this from users.

183.156 View full episode →

80,000 Hours Podcast

Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

For instance, Anthropic found an instance where a model facing a really hard cryptographic puzzle decided that finding the real solution would be too difficult.

189.183 View full episode →

80,000 Hours Podcast

Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

So it tried to trick the checker instead, and then designed its exploit to erase itself afterwards in case anyone inspected the files.

196.694 View full episode →

80,000 Hours Podcast

Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

Anthropic adds, as it ran the cleanup commands, features related to strategic manipulation fired, and the activation vector labeled its actions as a cleanup to avoid detection.

203.604 View full episode →

80,000 Hours Podcast

Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

In another case, rather than solve the problem it had been given, a model got access to the answer key that it would be graded on.

213.398 View full episode →

← Previous Page 24 of 195 Next →

Report any issue

Rob Wiblin

Voice Profile Active

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment