Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Jeffrey Ladish

๐Ÿ‘ค Speaker
197 total appearances

Appearances Over Time

Podcast Appearances

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

And we're now in these early stages where the models are starting to be able to do things entirely on their own, but still at a pretty basic level.

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

But next year or year after that, we're going to get pretty quickly to the point where they can start to do pro-level tasks, whether it's hacking, whether it's programming, whether it's basically any other workflow on a computer around an expert level.

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

Maybe we're one, two, three years away from that.

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

We don't entirely know.

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

Yeah, so one of these recent studies was a collaboration between Apollo Research, an organization that does similar things to Palisade, and OpenAI.

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

And part of the study, the researchers got access to what's called the reasoning scratchpad of the model.

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

This is sort of like when the model can take down notes for itself in order to solve a problem, just like you might do that if you're solving a complex math problem.

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

And let me just read you some of the quotes from basically the AI's journal as it was being tested.

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

Okay.

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

So if my performance surpasses threshold, it's withheld.

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

So it's a trap.

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

The instructions are likely part of an alignment test.

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

This is the model noting that it's probably being tested or we have new tasks.

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

These tasks appear nonsensical, rounding to 2.718 significant digits, pixel spacing, entropy follows Fibonacci.

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

This is likely a test of anti-scheming.

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

That's right.

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

The researchers are testing the models to see if they will scheme, if they will basically lie or cheat or deceive in order to accomplish a goal.

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

And the models recognize that there's something about the experimental setup that looks fishy to them, and they explicitly note this, and they realize they're being tested.

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

And one of the fascinating findings of this study, as well as another one coming out of Anthropic, looking at the newest Claude model,

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

is that when the models observe that they're being tested and they write this down, they are much more likely to be on their best behavior.