Menu
Sign In Search Podcasts Libraries Charts People & Topics Add Podcast API Blog Pricing

Dario Amodei

πŸ‘€ Speaker
1816 total appearances

Appearances Over Time

Podcast Appearances

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

So I can only describe it as it relates to kind of my own experience, but I've been in the AI field for about 10 years. And it was something I noticed very early on. So I first joined the AI world when I was working at Baidu with Andrew Ng in late 2014, which is almost exactly 10 years ago now. And the first thing we worked on was speech recognition systems.

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

And in those days, I think deep learning was a new thing. It had made lots of progress, but everyone was always saying, we don't have the algorithms we need to succeed. You know, we're not, we're only matching a tiny, tiny fraction. There's so much we need to kind of discover algorithmically. We haven't found the picture of how to match the human brain.

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

Uh, and when, you know, in some ways it was fortunate. I was kind of, you know, you can have almost beginner's luck, right? I was like a newcomer to the field. And, you know, I looked at the neural net that we were using for speech, the recurrent neural networks. And I said, I don't know, what if you make them bigger and give them more layers and

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

And what if you scale up the data along with this, right? I just saw these as like independent dials that you could turn. And I noticed that the model started to do better and better as you gave them more data, as you made the models larger, as you trained them for longer.

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

And I didn't measure things precisely in those days, but along with colleagues, we very much got the informal sense that the more data and the more compute and the more training you put into these models, the better they perform. And so initially my thinking was, hey, maybe that is just true for speech recognition systems, right? Maybe that's just one particular quirk, one particular area.

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

I think it wasn't until 2017 when I first saw the results from GPT-1. that it clicked for me that language is probably the area in which we can do this. We can get trillions of words of language data. We can train on them. And the models we were trained in those days were tiny.

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

You could train them on one to eight GPUs, whereas, you know, now we train jobs on tens of thousands, soon going to hundreds of thousands of GPUs. And so when I saw those two things together, and, you know, there were a few people like Ilya Sutskiver, who you've interviewed, who had somewhat similar views, right?

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

He might have been the first one, although I think a few people came to similar views around the same time, right? There was, you know, Rich Sutton's bitter lesson. There was, Goren wrote about the scaling hypothesis. But I think somewhere between 2014 and 2017 was when it really clicked for me, when I really got conviction that, hey, we're going to be able to do these incredible

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

incredibly wide cognitive tasks if we just scale up the models. And at every stage of scaling, there are always arguments. And when I first heard them, honestly, I thought, probably I'm the one who's wrong. And all these experts in the field are right. They know the situation better than I do. There's the Chomsky argument about you can get syntactics, but you can't get semantics.

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

There was this idea, oh, you can make a sentence make sense, but you can't make a paragraph make sense. You know, we're going to run out of data or the data isn't high quality enough or models can't reason. And each time, every time, we manage to either find a way around or scaling just is the way around. Sometimes it's one, sometimes it's the other.

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

And so I'm now at this point, I still think, you know, it's always quite uncertain. We have nothing but inductive inference to tell us that the next few years are going to be like the last 10 years. But I've seen the movie enough times.

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

I've seen the story happen for enough times to really believe that probably the scaling is going to continue and that there's some magic to it that we haven't really explained on a theoretical basis yet.

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

Yes. All of those. In particular, linear scaling up of bigger networks bigger training times, and more data. So all of these things, almost like a chemical reaction. You have three ingredients in the chemical reaction, and you need to linearly scale up the three ingredients. If you scale up one, not the others, you run out of the other reagents and the reaction stops.

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

But if you scale up everything in series, then the reaction can proceed.

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

Yeah, we've documented scaling laws in lots of domains other than language, right? So initially, the paper we did that first showed it was in early 2020, where we first showed it for language. There was then some work late in 2020 where we showed the same thing for other modalities like images, video, text to image, image to text, math, that they all had the same pattern. And you're right.

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

Now, there are other stages like post-training or there are new types of reasoning models. And in all of those cases that we've measured, we see similar types of scaling laws.

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

So in my previous career as a biophysicist, so I did physics undergrad and then biophysics in grad school. So I think back to what I know as a physicist, which is actually much less than what some of my colleagues at Anthropic have in terms of expertise in physics.

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

there's this concept called the 1 over f noise and 1 over x distributions, where often, you know, just like if you add up a bunch of natural processes, you get a Gaussian. If you add up a bunch of kind of differently distributed natural processes. If you like, take a probe and hook it up to a resistor. The distribution of the thermal noise in the resistor goes as one over the frequency.

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

It's some kind of natural convergent distribution. And I think what it amounts to is that if you look at a lot of things that are produced by some natural process that has a lot of different scales, right? Not a Gaussian, which is kind of narrowly distributed.