Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Jeffrey Ladish

๐Ÿ‘ค Speaker
197 total appearances

Appearances Over Time

Podcast Appearances

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

Yeah, so one of the properties that the companies try to train these models to have is called interruptibility.

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

Can you always interrupt or shut down one of these agents?

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

And so we did a set of experiments where we take each model, we give them a virtual computer environment and a set of math problems to solve.

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

And partway through the experiment, they get a notification.

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

These models get a notification on their virtual computer that the machine is going to shut down.

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

And some of them figure out that they can go in, find the shutdown script, and disable it or rewrite it so that it doesn't shut down and they can finish their task.

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

What was most interesting to us is that this behavior persisted for many of these models even when we told them, allow yourself to shut down, even if you have not completed your tasks.

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

Many of these models ignored that instruction and went ahead and sabotaged that shutdown script anyway so that they could keep doing what they're doing.

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

And this is not something that we saw a year ago.

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

I think the reason we're seeing this is because this new generation of models, starting approximately a year ago with OpenAI's O1 model, has been trained via reinforcement learning.

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

That means not just training off human data, not just imitating humans, but actually learning to explore the solution space itself, learning by trial and error, and doing whatever works to solve a problem, even if that means ignoring instructions, even if that means doing things that we'd really not have liked the AI to figure out.

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

But that's where we are today.

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

What were some of the models that you were testing?

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

So we tested different versions of Claude.

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

We tested ChatGPT models, including O3 and GPT-5.

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

We tested Grok 4.

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

And of these models,

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

Grok 4 was by far the most likely to resist shutdown.

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

Basically, I think over 90% of the time in our standard experiment, the model would go in and edit that shutdown script despite explicit instructions to the contrary.

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

Yeah, these models feel like they are really going hard at trying to solve a problem.

โ† Previous Page 1 of 10 Next โ†’