Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Dario Amodei

๐Ÿ‘ค Speaker
1367 total appearances

Appearances Over Time

Podcast Appearances

Lex Fridman Podcast
#452 โ€“ Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

And there's, you know, new types of post-training the model against itself that are used every day. So it's not just RLHF, it's a bunch of other methods as well. Post-training, I think, you know, is becoming more and more sophisticated.

Lex Fridman Podcast
#452 โ€“ Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

We observed that as well, by the way. There were a couple very strong engineers here at Anthropic who all previous code models, both produced by us and produced by all the other companies, hadn't really been useful to them. They said, maybe this is useful to a beginner. It's not useful to me. But

Lex Fridman Podcast
#452 โ€“ Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

We observed that as well, by the way. There were a couple very strong engineers here at Anthropic who all previous code models, both produced by us and produced by all the other companies, hadn't really been useful to them. They said, maybe this is useful to a beginner. It's not useful to me. But

Lex Fridman Podcast
#452 โ€“ Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

We observed that as well, by the way. There were a couple very strong engineers here at Anthropic who all previous code models, both produced by us and produced by all the other companies, hadn't really been useful to them. They said, maybe this is useful to a beginner. It's not useful to me. But

Lex Fridman Podcast
#452 โ€“ Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

Sonnet 3.5, the original one for the first time, they said, oh my God, this helped me with something that it would have taken me hours to do. This is the first model that has actually saved me time. So again, the waterline is rising. And then I think the new Sonnet has been even better. In terms of what it takes, I mean, I'll just say it's been across the board.

Lex Fridman Podcast
#452 โ€“ Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

Sonnet 3.5, the original one for the first time, they said, oh my God, this helped me with something that it would have taken me hours to do. This is the first model that has actually saved me time. So again, the waterline is rising. And then I think the new Sonnet has been even better. In terms of what it takes, I mean, I'll just say it's been across the board.

Lex Fridman Podcast
#452 โ€“ Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

Sonnet 3.5, the original one for the first time, they said, oh my God, this helped me with something that it would have taken me hours to do. This is the first model that has actually saved me time. So again, the waterline is rising. And then I think the new Sonnet has been even better. In terms of what it takes, I mean, I'll just say it's been across the board.

Lex Fridman Podcast
#452 โ€“ Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

It's in the pre-training, it's in the post-training, it's in various evaluations that we do. We've observed this as well. And if we go into the details of the benchmark, so SWE bench is basically, you know, since you're a programmer, you know, you'll be familiar with like pull requests and, you know, just pull requests are like, you know, like a sort of atomic unit of work.

Lex Fridman Podcast
#452 โ€“ Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

It's in the pre-training, it's in the post-training, it's in various evaluations that we do. We've observed this as well. And if we go into the details of the benchmark, so SWE bench is basically, you know, since you're a programmer, you know, you'll be familiar with like pull requests and, you know, just pull requests are like, you know, like a sort of atomic unit of work.

Lex Fridman Podcast
#452 โ€“ Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

It's in the pre-training, it's in the post-training, it's in various evaluations that we do. We've observed this as well. And if we go into the details of the benchmark, so SWE bench is basically, you know, since you're a programmer, you know, you'll be familiar with like pull requests and, you know, just pull requests are like, you know, like a sort of atomic unit of work.

Lex Fridman Podcast
#452 โ€“ Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

You know, you could say, you know, I'm implementing one, I'm implementing one thing. And so SweeBench actually gives you kind of a real world situation where the code base is in the current state and I'm trying to implement something that's described in language.

Lex Fridman Podcast
#452 โ€“ Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

You know, you could say, you know, I'm implementing one, I'm implementing one thing. And so SweeBench actually gives you kind of a real world situation where the code base is in the current state and I'm trying to implement something that's described in language.

Lex Fridman Podcast
#452 โ€“ Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

You know, you could say, you know, I'm implementing one, I'm implementing one thing. And so SweeBench actually gives you kind of a real world situation where the code base is in the current state and I'm trying to implement something that's described in language.

Lex Fridman Podcast
#452 โ€“ Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

We have internal benchmarks where we measure the same thing and you say, just give the model free reign to like do anything, run anything, edit anything. How well is it able to complete these tasks? And it's that benchmark that's gone from it can do it 3% of the time to it can do it about 50% of the time.

Lex Fridman Podcast
#452 โ€“ Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

We have internal benchmarks where we measure the same thing and you say, just give the model free reign to like do anything, run anything, edit anything. How well is it able to complete these tasks? And it's that benchmark that's gone from it can do it 3% of the time to it can do it about 50% of the time.

Lex Fridman Podcast
#452 โ€“ Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

We have internal benchmarks where we measure the same thing and you say, just give the model free reign to like do anything, run anything, edit anything. How well is it able to complete these tasks? And it's that benchmark that's gone from it can do it 3% of the time to it can do it about 50% of the time.

Lex Fridman Podcast
#452 โ€“ Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

So I actually do believe that if we get โ€“ you can gain benchmarks, but I think if we get to 100% on that benchmark in a way that isn't kind of like over-trained or โ€“ or game for that particular benchmark, probably represents a real and serious increase in kind of programming ability.

Lex Fridman Podcast
#452 โ€“ Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

So I actually do believe that if we get โ€“ you can gain benchmarks, but I think if we get to 100% on that benchmark in a way that isn't kind of like over-trained or โ€“ or game for that particular benchmark, probably represents a real and serious increase in kind of programming ability.

Lex Fridman Podcast
#452 โ€“ Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

So I actually do believe that if we get โ€“ you can gain benchmarks, but I think if we get to 100% on that benchmark in a way that isn't kind of like over-trained or โ€“ or game for that particular benchmark, probably represents a real and serious increase in kind of programming ability.

Lex Fridman Podcast
#452 โ€“ Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

And I would suspect that if we can get to 90, 95%, that it will represent ability to autonomously do a significant fraction of software engineering tasks.