Dario Amodei
π€ SpeakerAppearances Over Time
Podcast Appearances
Models are then tested with some of our early partners to see how good they are, and they're then tested both internally and externally for their safety, particularly for catastrophic and autonomy risks. Uh, so, uh, we do internal testing according to our responsible scaling policy, which I, you know, could talk more about that in detail.
And then we have an agreement with the U S and the UK AI safety Institute, as well as other third-party testers in specific domains to test the models for what are called CBRN risk, chemical, biological, radiological, and nuclear, which are, you know, we don't think that models are
pose these risks seriously yet, but every new model we want to evaluate to see if we're starting to get close to some of these more dangerous capabilities. So those are the phases. And then it just takes some time to get the model working in terms of inference and launching it in the API. So there's just just a lot of steps to actually making a model work.
And of course, we're always trying to make the processes as streamlined as possible, right? We want our safety testing to be rigorous, but we want it to be rigorous and to be automatic, to happen as fast as it can without compromising on rigor. Same with our pre-training process and our post-training process. So it's just like building anything else. It's just like building airplanes.
You want to make them You want to make them safe, but you want to make the process streamlined. And I think the creative tension between those is an important thing in making the models work.
you would be surprised how much of the challenges of, you know, building these models comes down to, you know, software engineering, performance engineering, you know, you, you know, from the outside, you might think, oh man, we had this Eureka breakthrough, right? You know, this movie with the science, we discovered it, we figured it out.
But, but, but I think, I think all things, even, even, even, you know, incredible discoveries like, They almost always come down to the details and often super, super boring details. I can't speak to whether we have better tooling than other companies. I mean, you know, I haven't been at those other companies, at least not recently, but it's certainly something we give a lot of attention to.
Yeah, I think at any given stage, we're focused on improving everything at once. Okay. Um, just, just naturally like there are different teams. Each team makes progress in a particular area in, in, in making a particular, you know, their particular segment of the relay race better. And it's just natural that when we make a new model, we put, we put all of these things in at once.
Yeah, preference data from old models sometimes gets used for new models, although, of course, it performs somewhat better when it's, you know, trained on the new models. Note that we have this, you know, constitutional AI method such that we don't only use preference data, we kind of, there's also a post-training process where we train the model against itself.
And there's, you know, new types of post-training the model against itself that are used every day. So it's not just RLHF, it's a bunch of other methods as well. Post-training, I think, you know, is becoming more and more sophisticated.
We observed that as well, by the way. There were a couple very strong engineers here at Anthropic who all previous code models, both produced by us and produced by all the other companies, hadn't really been useful to them. They said, maybe this is useful to a beginner. It's not useful to me. But
Sonnet 3.5, the original one for the first time, they said, oh my God, this helped me with something that it would have taken me hours to do. This is the first model that has actually saved me time. So again, the waterline is rising. And then I think the new Sonnet has been even better. In terms of what it takes, I mean, I'll just say it's been across the board.
It's in the pre-training, it's in the post-training, it's in various evaluations that we do. We've observed this as well. And if we go into the details of the benchmark, so SWE bench is basically, you know, since you're a programmer, you know, you'll be familiar with like pull requests and, you know, just pull requests are like, you know, like a sort of atomic unit of work.
You know, you could say, you know, I'm implementing one, I'm implementing one thing. And so SweeBench actually gives you kind of a real world situation where the code base is in the current state and I'm trying to implement something that's described in language.
We have internal benchmarks where we measure the same thing and you say, just give the model free reign to like do anything, run anything, edit anything. How well is it able to complete these tasks? And it's that benchmark that's gone from it can do it 3% of the time to it can do it about 50% of the time.
So I actually do believe that if we get β you can gain benchmarks, but I think if we get to 100% on that benchmark in a way that isn't kind of like over-trained or β or game for that particular benchmark, probably represents a real and serious increase in kind of programming ability.
And I would suspect that if we can get to 90, 95%, that it will represent ability to autonomously do a significant fraction of software engineering tasks.
Not giving an exact date, but as far as we know, the plan is still to have a Cloud 3.5 Opus.
Like Duke Nukem Forever.
You know, it's only been three months since we released the first Sonnet.