Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Joe Allen

๐Ÿ‘ค Speaker
See mentions of this person in podcasts
2504 total appearances

Appearances Over Time

Podcast Appearances

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

All right, Pasi, welcome back.

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

We are here with Jeffrey Ladish of Palisade Research.

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

Palisade Research works on AI evaluations, taking these models, which are extremely unpredictable and in some sense uncontrollable, and running them through a series of tests to see exactly what the limits are of their capabilities and really what the limits are of their will to survive.

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

Jeffrey, if we could just return really quickly to the Palisade studies showing that the models had some desire, so to speak, or at least a goal to continue beyond the user's desire that it shut down.

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

There are other examples that we have out of other organizations, right?

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

So Anthropic did the now widely publicized study

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

in which they created a virtual environment, told the model that one of the engineers had had an affair.

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

They weren't directing its attention to the email.

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

There were a whole lot of other potential emails.

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

So if you see the same type of behavior across models,

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

How do you explain it?

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

I mean, it's possible to say, I suppose, that this is something that the engineers working on it are kind of prompting it to do.

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

But I think you've done at least a fairly good job of showing at least an alternative explanation.

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

It's not that they were directing their attention to the email, for instance, nor were you guys giving instructions to rewrite the script or rewrite the code.

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

It simply arrived at it on its own.

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

So on a kind of philosophical level,

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

What do you think is going on?

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

Why would just code or just a machine have a will to survive at all?

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

and so they were inadvertently rewarding this behavior of the dog pushing children into the river so pavlov's dog goes rogue that's right yeah it goes predatory yeah i'd like to i really want to talk about some of the other studies that you know around situational awareness or emergent misalignment uh things that you could speak to much better than i could but

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

before you know i would like for the war room audience to have a sound sense of exactly what goes on inside organizations like palisade research center for ai safety apollo research all these what does it look like day to day just briefly when you are testing a model