Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Pricing

Joe Carlsmith

👤 Person
1218 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

So, you know, maybe the AIs are like, they really want to be like shmeltful and like shmonest and shmarmless, right?

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

But their concept is, like, importantly different from the human concept.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

And they know this.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

So they know that the human concept would mean blah.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

But they, like, ended up... Their values ended up fixating on, like, a somewhat different structure.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

Yeah.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

So that's, like, another version.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

And then a fourth version... Or a fifth version, which I think...

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

you know, I think about less because I think it's just like such an own goal if you do this, but I do think it's possible.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

It's just like, you could have AIs that are actually just doing what it says on the tin.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

Like you have AIs that are just genuinely aligned to the model spec.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

They're just really, they're just really trying to like benefit humanity and reflect well on open AI.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

And what's, what's the, what's the other one?

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

Help that, you know, assist the developer or the user, right?

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

Yeah.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

But your model spec, unfortunately, was just not robust to the degree of optimization that this AI is bringing to bear.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

And so, you know, it decides when it's looking out at the world and they're like, what's the best way to benefit open AI and or sorry, reflect on open AI and and benefit humanity and such.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

And so it decides that, you know, the best way is to go rogue.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

That's I think that's like a real own goal, because at that point you like.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

you got so close, you know, you really, you really, you just have to write the model spec well, um, and red team it suitably.