Joe Carlsmith

Like, let's say we've got an AI, and let's, again, let's bracket the question of, like, exactly how capable will it be, and really just talk about this extreme scenario of, like, it really has this opportunity to take over, right?

346.813 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

Which I do think, you know, maybe we just want to not, we do not want to deal with that, with having to build an AI that we're comfortable being in that position, but let's just focus on it for the sake of simplicity, and then we can relax the assumption.

359.392 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

You know, okay, so you have some hope.

371.737 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

It's like, I'm going to build an AI over here.

373.521 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

So one issue is you can't just test.

374.944 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

You can't give the AI this literal situation, have it take over and kill everyone and then be like, oops, like update the weights.

377.53 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

This is the thing Eliezer talks about of sort of like, you can't, you know, you care about its behavior on this like specific scale.

384.565 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

In a specific scenario that you can't test directly.

390.097 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

Now, we can talk about whether that's a problem, but that's like one issue is that there's a sense in which this has to be kind of like off distribution and you have to be getting some kind of generalization from your training the AI on a bunch of other scenarios.

393.743 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

And then there's this question of how is it going to generalize to the scenario where it really has this option.

407.364 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

Yeah.

444.264 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment