Joe Carlsmith

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

right?

509.048 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

You understand it.

509.349 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

This is why I'm saying like the, when the model, you're like, Oh, the models really understand human values.

509.909 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

It's like, yeah.

513.676 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

Yes, I think that's... So yeah, I think basically a decent portion of the hope here, or I think we should just... An aim should be we're never in the situation where the AI really has very different values.

564.753 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

already is quite smart, really knows what's going on, and is now in this kind of adversarial relationship with our training process, right?

575.654 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

So we wanna avoid that.

582.311 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

The main thing, and I think it's possible we can via the sorts of things you're saying.

584.135 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

So I'm not like, ah, that'll never work.

587.123 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

The thing I just wanted to highlight was like, if you get into that situation,

590.331 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

And if the AI is genuinely at that point, like much, much more sophisticated than you and doesn't want to kind of reveal its true values for whatever reason, then, you know, when the children show like some like kind of obviously fake opportunity to like defect to the allies, right?

593.238 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

You, you know, it's sort of not necessarily going to be a good test of what will you do in the real circumstance because you're able to tell.

613.807 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

Yeah.

687.097 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

I mean, I don't know.

687.237 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

I think I'm hesitant to be like, it's like drugs for the model.

687.978 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

Like I think there's... But broadly speaking, I do...

690.44 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

basically agree that I think we have like really quite a lot of tools and options, um, for kind of training AIs, even AIs that are kind of somewhat smarter than humans.

696.007 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

I do think you have to actually do it.

705.185 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

So I, you know, I am, I think compared to maybe you had Eliezer on, like, I think I'm much more, much more bullish on our ability to solve this problem, especially for AIs that are, um,

708.09 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

in what I think of as the AI for AI safety sweet spot, which is this sort of band of capability where they're both very sufficiently capable that they can be really useful for strengthening various factors in our civilization that can make us safe.

718.417 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment