Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Pricing

Joe Carlsmith

👤 Person
1218 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

One thing I'll just say off the bat, it's like when I'm thinking about misaligned AIs, I'm thinking about, or the type that I'm worried about, I'm thinking about AIs that have a relatively specific set of properties related to agency and planning and kind of awareness and understanding of the world.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

One is this capacity to plan.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

and kind of make kind of relatively sophisticated plans on the basis of models of the world, where those plans are being kind of evaluated according to criteria.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

That planning capability needs to be driving the model's behavior.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

So there are models that are sort of in some sense capable of planning, but it's not like when they give output, it's not like that output

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

was determined by some process of planning.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

Like, here's what'll happen if I give this output, and do I want that to happen?

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

The model needs to really understand the world, right?

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

It needs to really be like, okay, here's what will happen.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

Here I am, here's my situation, here's the politics of the situation.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

Really kind of having this kind of situational awareness to be able to evaluate the consequences of different plans.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

I think the other thing is like,

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

So the verbal behavior of these models, I think, need bear no... So when I talk about a model's values, I'm talking about the criteria that kind of end up determining which plans the model pursues, right?

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

And a model's verbal behavior, even if it has a planning process, which GPT-4, I think, doesn't in many cases, its verbal behavior just doesn't...

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

doesn't need to reflect those criteria.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

Right.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

Um, and so, uh,

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

you know, we know that we're going to be able to get models to say what we want to hear.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

Right.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

We, uh, uh, that is the magic of gradient descent.

← Previous Page 1 of 61 Next →