Joe Carlsmith

👤 Speaker

1218 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

One thing I'll just say off the bat, it's like when I'm thinking about misaligned AIs, I'm thinking about, or the type that I'm worried about, I'm thinking about AIs that have a relatively specific set of properties related to agency and planning and kind of awareness and understanding of the world.

37.865 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

One is this capacity to plan.

54.915 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

and kind of make kind of relatively sophisticated plans on the basis of models of the world, where those plans are being kind of evaluated according to criteria.

59.161 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

That planning capability needs to be driving the model's behavior.

66.273 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

So there are models that are sort of in some sense capable of planning, but it's not like when they give output, it's not like that output

70.119 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

was determined by some process of planning.

75.007 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

Like, here's what'll happen if I give this output, and do I want that to happen?

77.591 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

The model needs to really understand the world, right?

80.615 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

It needs to really be like, okay, here's what will happen.

82.858 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

Here I am, here's my situation, here's the politics of the situation.

86.403 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

Really kind of having this kind of situational awareness to be able to evaluate the consequences of different plans.

89.527 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

I think the other thing is like,

97.298 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

So the verbal behavior of these models, I think, need bear no... So when I talk about a model's values, I'm talking about the criteria that kind of end up determining which plans the model pursues, right?

100.285 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

And a model's verbal behavior, even if it has a planning process, which GPT-4, I think, doesn't in many cases, its verbal behavior just doesn't...

116.744 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

doesn't need to reflect those criteria.

124.855 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

Right.

127.697 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

Um, and so, uh,

128.14 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

you know, we know that we're going to be able to get models to say what we want to hear.

131.47 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

Right.

137.839 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

We, uh, uh, that is the magic of gradient descent.

138.019 View full episode →

← Previous Page 1 of 61 Next →

Report any issue