Jeffrey Ladish

Bannon`s War Room

WarRoom Battleground EP 884: When AI Controls Your Life

And we're now in these early stages where the models are starting to be able to do things entirely on their own, but still at a pretty basic level.

2347.343 View full episode →

Bannon`s War Room

WarRoom Battleground EP 884: When AI Controls Your Life

But next year or year after that, we're going to get pretty quickly to the point where they can start to do pro-level tasks, whether it's hacking, whether it's programming, whether it's basically any other workflow on a computer around an expert level.

2355.056 View full episode →

Bannon`s War Room

WarRoom Battleground EP 884: When AI Controls Your Life

Maybe we're one, two, three years away from that.

2369.239 View full episode →

Bannon`s War Room

WarRoom Battleground EP 884: When AI Controls Your Life

We don't entirely know.

2370.902 View full episode →

Bannon`s War Room

WarRoom Battleground EP 884: When AI Controls Your Life

Yeah, so one of these recent studies was a collaboration between Apollo Research, an organization that does similar things to Palisade, and OpenAI.

2431.194 View full episode →

Bannon`s War Room

WarRoom Battleground EP 884: When AI Controls Your Life

And part of the study, the researchers got access to what's called the reasoning scratchpad of the model.

2438.384 View full episode →

Bannon`s War Room

WarRoom Battleground EP 884: When AI Controls Your Life

This is sort of like when the model can take down notes for itself in order to solve a problem, just like you might do that if you're solving a complex math problem.

2444.652 View full episode →

Bannon`s War Room

WarRoom Battleground EP 884: When AI Controls Your Life

And let me just read you some of the quotes from basically the AI's journal as it was being tested.

2453.043 View full episode →

Bannon`s War Room

WarRoom Battleground EP 884: When AI Controls Your Life

Okay.

2458.59 View full episode →

Bannon`s War Room

WarRoom Battleground EP 884: When AI Controls Your Life

So if my performance surpasses threshold, it's withheld.

2460.332 View full episode →

Bannon`s War Room

WarRoom Battleground EP 884: When AI Controls Your Life

So it's a trap.

2462.996 View full episode →

Bannon`s War Room

WarRoom Battleground EP 884: When AI Controls Your Life

The instructions are likely part of an alignment test.

2463.957 View full episode →

Bannon`s War Room

WarRoom Battleground EP 884: When AI Controls Your Life

This is the model noting that it's probably being tested or we have new tasks.

2467.477 View full episode →

Bannon`s War Room

WarRoom Battleground EP 884: When AI Controls Your Life

These tasks appear nonsensical, rounding to 2.718 significant digits, pixel spacing, entropy follows Fibonacci.

2471.788 View full episode →

Bannon`s War Room

WarRoom Battleground EP 884: When AI Controls Your Life

This is likely a test of anti-scheming.

2478.626 View full episode →

Bannon`s War Room

WarRoom Battleground EP 884: When AI Controls Your Life

That's right.

2489.355 View full episode →

Bannon`s War Room

WarRoom Battleground EP 884: When AI Controls Your Life

The researchers are testing the models to see if they will scheme, if they will basically lie or cheat or deceive in order to accomplish a goal.

2489.656 View full episode →

Bannon`s War Room

WarRoom Battleground EP 884: When AI Controls Your Life

And the models recognize that there's something about the experimental setup that looks fishy to them, and they explicitly note this, and they realize they're being tested.

2496.846 View full episode →

Bannon`s War Room

WarRoom Battleground EP 884: When AI Controls Your Life

And one of the fascinating findings of this study, as well as another one coming out of Anthropic, looking at the newest Claude model,

2504.317 View full episode →

Bannon`s War Room

WarRoom Battleground EP 884: When AI Controls Your Life

is that when the models observe that they're being tested and they write this down, they are much more likely to be on their best behavior.

2511.942 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment