Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Pricing

Joe Carlsmith

👤 Person
1218 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

So that's a very specific scenario.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

And if we're in a scenario where power is more distributed and especially where we're doing like decently on alignment, right?

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

And we're giving the AI some amount of inhibition about doing different things.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

And maybe we're succeeding in shaping their values somewhat.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

Now it is, I think it's just a much more complicated calculus, right?

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

And you have to ask, okay, like,

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

What's the upside for the AI?

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

What's the probability of success for this like takeover path?

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

So, you know, why is alignment hard in general, right?

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

Like, let's say we've got an AI, and let's, again, let's bracket the question of, like, exactly how capable will it be, and really just talk about this extreme scenario of, like, it really has this opportunity to take over, right?

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

Which I do think, you know, maybe we just want to not, we do not want to deal with that, with having to build an AI that we're comfortable being in that position, but let's just focus on it for the sake of simplicity, and then we can relax the assumption.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

You know, okay, so you have some hope.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

It's like, I'm going to build an AI over here.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

So one issue is you can't just test.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

You can't give the AI this literal situation, have it take over and kill everyone and then be like, oops, like update the weights.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

This is the thing Eliezer talks about of sort of like, you can't, you know, you care about its behavior on this like specific scale.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

In a specific scenario that you can't test directly.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

Now, we can talk about whether that's a problem, but that's like one issue is that there's a sense in which this has to be kind of like off distribution and you have to be getting some kind of generalization from your training the AI on a bunch of other scenarios.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

And then there's this question of how is it going to generalize to the scenario where it really has this option.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

Yeah.