Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Pricing

Joe Carlsmith

👤 Person
1218 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

right?

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

You understand it.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

This is why I'm saying like the, when the model, you're like, Oh, the models really understand human values.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

It's like, yeah.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

Yes, I think that's... So yeah, I think basically a decent portion of the hope here, or I think we should just... An aim should be we're never in the situation where the AI really has very different values.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

already is quite smart, really knows what's going on, and is now in this kind of adversarial relationship with our training process, right?

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

So we wanna avoid that.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

The main thing, and I think it's possible we can via the sorts of things you're saying.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

So I'm not like, ah, that'll never work.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

The thing I just wanted to highlight was like, if you get into that situation,

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

And if the AI is genuinely at that point, like much, much more sophisticated than you and doesn't want to kind of reveal its true values for whatever reason, then, you know, when the children show like some like kind of obviously fake opportunity to like defect to the allies, right?

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

You, you know, it's sort of not necessarily going to be a good test of what will you do in the real circumstance because you're able to tell.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

Yeah.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

I mean, I don't know.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

I think I'm hesitant to be like, it's like drugs for the model.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

Like I think there's... But broadly speaking, I do...

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

basically agree that I think we have like really quite a lot of tools and options, um, for kind of training AIs, even AIs that are kind of somewhat smarter than humans.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

I do think you have to actually do it.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

So I, you know, I am, I think compared to maybe you had Eliezer on, like, I think I'm much more, much more bullish on our ability to solve this problem, especially for AIs that are, um,

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

in what I think of as the AI for AI safety sweet spot, which is this sort of band of capability where they're both very sufficiently capable that they can be really useful for strengthening various factors in our civilization that can make us safe.