Joe Carlsmith

Joe Carlsmith - Otherness and control in the age of AGI

So our alignment work, control, cybersecurity, general epistemics, maybe some coordination applications, stuff like that.

733.819 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

There's a bunch of stuff you can do with AIs.

740.068 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

that in principle could kind of differentially accelerate our security with respect to the sorts of considerations we're talking about.

742.852 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

If you have the eyes that are capable of that, and you can successfully elicit that capability in a way that's not sort of being sabotaged or like messing with you in other ways, and they can't yet take over the world or do some other sort of really problematic form of power seeking, then I think

750.446 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

if we were really committed, we could like, you know, really go, go hard, put a ton of resources, really differentially direct this like glut of AI productivity towards these sort of security factors.

765.7 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

And, and hopefully kind of control and understand, you know, do a lot of these things you're talking about for kind of making sure our AIs don't kind of take over or mess with us in the meantime.

777.275 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

And I think we have a lot of tools there.

787.349 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

I think you have to

788.831 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

You have to really try though.

790.591 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

It's possible that those sorts of measures just don't happen or don't happen at the level of kind of commitment and diligence and like seriousness that you would need.

791.952 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

Especially if things are like moving really fast and there's other sort of competitive pressures and like, you know, the compute, ah, this is going to take compute to do these like intensive, all these experiments on the AIs and stuff and that compute, we could use that for experiments for the, you know, the next, the next scaling step and stuff like that.

800.502 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

So, yeah.

812.775 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

you know, I do, I am, I'm not here saying like, this is impossible, especially for that band of AIs.

814.317 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

It's just, I think you have to, you have to try really hard.

819.105 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

Yeah.

821.789 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

it is the case that by the time we're building super intelligence, we'll have like much better.

908.345 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

Uh, I mean, even right now, like when you, when you look at like labs talking about how they're planning to align the AIs, no one is saying like, we're going to do our LHF.

914.174 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

Um, you know, at the least you're talking about scalable oversight.

921.205 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

Um, you're, you have like some hope about, uh, interpretability.

923.629 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

You have like automated red teaming.

926.233 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment