Dario Amodei

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

And so often, if I ask the model a question, if I'm like, do task X versus can you do task X, the model might respond in different ways. And so there are all kinds of subtle things that you can change about the way you interact with the model that can give you very different results.

3138.283 View full episode →

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

To be clear, this itself is like a failing by us and by the other model providers that the models are just often sensitive to like small changes in wording. It's yet another way in which the science of how these models work is very poorly developed.

3158.928 View full episode →

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

And so if I go to sleep one night and I was talking to the model in a certain way and I slightly change the phrasing of how I talk to the model, I could get different results. So that's one possible way. The other thing is, man, it's just hard to quantify this stuff. It's hard to quantify this stuff. I think people are very excited by new models when they come out.

3174.964 View full episode →

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

And then as time goes on, they become very aware of the limitations. So that may be another effect. But that's all a very long-winded way of saying, for the most part, with some fairly narrow exceptions, the models are not changing.

3194.545 View full episode →

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

And now I'm like, I can't get this thing to work. This is such a piece of crap.

3218.297 View full episode →

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

Yeah. So a couple points on this first. One is like things that people say on Reddit and Twitter or X or whatever it is. There's actually a huge distribution shift between like the stuff that people complain loudly about on social media and what actually kind of like statistically users care about and that drives people to use the models.

3251.906 View full episode →

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

People are frustrated with things like the model not writing out all the code or the model just not being as good at code as it could be, even though it's the best model in the world on code. I think the majority of things are about that, but certainly a kind of vocal minority are you know, kind of raise these concerns, right?

3272.875 View full episode →

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

Are frustrated by the model, refusing things that it shouldn't refuse or like apologizing too much or just having these kind of like annoying verbal tics. The second caveat, and I just want to say this like super clearly because I think it's like, some people don't know it. Others like kind of know it, but forget it.

3296.135 View full episode →

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

Like it is very difficult to control across the board how the models behave, right? You cannot just reach in there and say, oh, I want the model to like apologize less. Like you can do that. You can include trading data that says like, oh, the model should like apologize less.

3314.231 View full episode →

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

But then in some other situation, they end up being like super rude or like overconfident in a way that's like misleading people. So there are all these trade-offs, right? For example, another thing is if there was a period during which models, ours and I think others as well, were too verbose, right? They would like repeat themselves. They would say too much.

3329.785 View full episode →

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

You can cut down on the verbosity by penalizing the models for just talking for too long. What happens when you do that, if you do it in a crude way, is when the models are coding, sometimes they'll say, rest of the code goes here, right? Because they've learned that that's a way to economize and that they see it.

3351.217 View full episode →

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

So that leads the model to be so-called lazy in coding, where they're just like, ah, you can finish the rest of it. It's not because we want to save on compute or because the models are lazy during winter break or any of the other kind of conspiracy theories that have come up.

3368.324 View full episode →

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

It's actually – it's just very hard to control the behavior of the model, to steer the behavior of the model in all circumstances at once. You can kind of – there's this whack-a-mole aspect where you push on one thing and like these – these other things start to move as well that you may not even notice or measure.

3385.737 View full episode →

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

And so one of the reasons that I, that I care so much about, uh, you know, kind of grand alignment of these AI systems in the future is actually, these systems are actually quite unpredictable. They're actually quite hard to steer and control. Um, and this version we're seeing today of you make one thing better. It makes another thing worse. Uh,

3406.412 View full episode →

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

I think that's, that's like a present day analog of future control problems in AI systems that we can start to study today. Right. I think, I think that, that, that difficulty in, in steering the behavior and in making sure that if we push an AI system in one direction, it doesn't push it in another direction in some, in some other ways that we didn't want. Uh,

3429.116 View full episode →

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

I think that's, that's kind of an, that's kind of an early sign of things to come. And if we can do a good job of solving this problem, right. Of like, you ask the model to like, you know, to like make and distribute smallpox and it says no, but it's willing to like help you in your graduate level virology class. Like how do we get both of those things at once? It's hard.

3453.321 View full episode →

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

It's very easy to go to one side or the other. And it's a multidimensional problem. And so, uh, I think these questions of shaping the model's personality, I think they're very hard. I think we haven't done perfectly on them. I think we've actually done the best of all the AI companies, but still so far from perfect.

3474.028 View full episode →

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

And I think if we can get this right, if we can control the false positives and false negatives in this very kind of controlled present day environment will be much better at doing it for the future when our worry is, will the models be super autonomous? Will they be able to make very dangerous things? Will they be able to autonomously build whole companies and are those companies aligned?

3494.584 View full episode →

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

So I think of this present task as both vexing, but also good practice for the future.

3519.624 View full episode →

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

So typically we'll have internal model bashings where all of Anthropic, Anthropic is almost a thousand people. You know, people just try and break the model. They try and interact with it various ways. Um, uh, we have a suite of evals, uh, for, you know, oh, is the model refusing in ways that it couldn't?

3544.792 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment