Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

22 Aug 2024

2h 30m

29119 words

3 speakers

22 Aug 2024

Audio

Description

Chatted with Joe Carlsmith about whether we can trust power/techno-capital, how to not end up like Stalin in our urge to control the future, gentleness towards the artificial Other, and much more.Check out Joe's sequence on Otherness and Control in the Age of AGI here.Watch on YouTube. Listen on Apple Podcasts, Spotify, or any other podcast platform. Read the full transcript here. Follow me on Twitter for updates on future episodes.Sponsors:- Bland.ai is an AI agent that automates phone calls in any language, 24/7. Their technology uses "conversational pathways" for accurate, versatile communication across sales, operations, and customer support. You can try Bland yourself by calling 415-549-9654. Enterprises can get exclusive access to their advanced model at bland.ai/dwarkesh.- Stripe is financial infrastructure for the internet. Millions of companies from Anthropic to Amazon use Stripe to accept payments, automate financial processes and grow their revenue.If you’re interested in advertising on the podcast, check out this page.Timestamps:(00:00:00) - Understanding the Basic Alignment Story(00:44:04) - Monkeys Inventing Humans(00:46:43) - Nietzsche, C.S. Lewis, and AI(1:22:51) - How should we treat AIs(1:52:33) - Balancing Being a Humanist and a Scholar(2:05:02) - Explore exploit tradeoffs and AI Get full access to Dwarkesh Podcast at www.dwarkesh.com/subscribe

Featured

Dwarkesh Patel

Bland AI

Joe Carlsmith

Transcription

Full Episode

0.031 - 20.911 Dwarkesh Patel

Today, I'm chatting with Joe Carl Smith. He's a philosopher, in my opinion, a capital G great philosopher. And you can find his essays at joecarlsmith.com. So we have a GPT-4 and it doesn't seem like a paper clipper kind of thing. It understands human values. In fact, if you help have it explain, like why is being a paper clipper bad?

20.931 - 37.48 Dwarkesh Patel

Or like, just tell me your opinions about being a paper clipper. Like explain why the galaxy shouldn't be turned into paperclips. Okay, so what is happening such that dot, dot, dot, we have a system that takes over and converts the world into something valueless?

37.865 - 58.742 Joe Carlsmith

One thing I'll just say off the bat, it's like when I'm thinking about misaligned AIs, I'm thinking about, or the type that I'm worried about, I'm thinking about AIs that have a relatively specific set of properties related to agency and planning and kind of awareness and understanding of the world. One is this capacity to plan.

59.161 - 75.027 Joe Carlsmith

and kind of make kind of relatively sophisticated plans on the basis of models of the world, where those plans are being kind of evaluated according to criteria. That planning capability needs to be driving the model's behavior. So there are models that are sort of in some sense capable of planning, but it's not like when they give output, it's not like that output

75.007 - 89.347 Joe Carlsmith

was determined by some process of planning. Like, here's what'll happen if I give this output, and do I want that to happen? The model needs to really understand the world, right? It needs to really be like, okay, here's what will happen. Here I am, here's my situation, here's the politics of the situation.

89.527 - 115.843 Joe Carlsmith

Really kind of having this kind of situational awareness to be able to evaluate the consequences of different plans. I think the other thing is like, So the verbal behavior of these models, I think, need bear no... So when I talk about a model's values, I'm talking about the criteria that kind of end up determining which plans the model pursues, right?

116.744 - 141.905 Joe Carlsmith

And a model's verbal behavior, even if it has a planning process, which GPT-4, I think, doesn't in many cases, its verbal behavior just doesn't... doesn't need to reflect those criteria. Right. Um, and so, uh, you know, we know that we're going to be able to get models to say what we want to hear. Right. We, uh, uh, that is the magic of gradient descent.

142.465 - 161.292 Joe Carlsmith

You know, if you, uh, you know, modulo, like some difficulties with capabilities, like you can get a model to kind of output the behavior that you want. If it doesn't, then you, you crank it till it does. Right. And, um, and I think everyone admits for suitably sophisticated models, they're going to have very detailed understanding of human morality. Um, uh,

161.845 - 188.173 Joe Carlsmith

But the question is what relationship is there between a model's verbal behavior, which you've essentially kind of clamped. You're like, the model must say blah things, and the criteria that end up influencing its choice between plans. And there, I think it's at least, I'm kind of pretty cautious about being like, well, when it says the thing I forced it to say,

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

Full Episode

Sign in to Audioscrape

Share this moment

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

Full Episode

Want to see the complete chapter?

Sign in to Audioscrape

Share this moment