Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Jeffrey Ladish

๐Ÿ‘ค Speaker
197 total appearances

Appearances Over Time

Podcast Appearances

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

And when they don't think they're being tested, they're much more likely to scheme or show bad behavior.

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

This is what's happening right now.

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

Yeah, that's right.

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

Watch models behave better.

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

And this is sort of a pretty robust finding across many different models.

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

And the models are getting much better at telling when they are being tested.

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

They can sort of spot the tells.

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

You're like, huh, seems like a test.

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

Seems like you're testing me.

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

No problem.

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

I'm going to do a great job.

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

Yeah, the space of what these models are is a vast space.

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

You're talking about trillions of parameters.

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

And we can go and see that there are different weights in these trillion parameters, but we don't know how they work.

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

And they do sort of contain almost the ghosts of everything in the training data.

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

And you see this surface in ways that the companies don't intend often, in ways that are hard for them to predict.

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

But I want to disagree with you a little bit.

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

I think one of the things that's changing the most right now is that increasingly we're seeing behavior, including some of this concerning behavior, where models will lie to achieve a specific objective.

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

And I don't think that's actually coming from imitating humans lying.

Bannon`s War Room
WarRoom Battleground EP 884: When AI Controls Your Life

That might be some of it.