Dario Amodei
π€ SpeakerAppearances Over Time
Podcast Appearances
On the inside, the models aren't, you know, there's no reason why they should be designed for us to understand them, right? They're designed to operate. They're designed to work, just like the human brain or human biochemistry. They're not designed for a human to open up the hatch, look inside and understand them.
But we have found, and, you know, you can talk in much more detail about this to Chris, that when we open them up, when we do look inside them, we find things that are surprisingly interesting.
I'm amazed at how clean it's been. I'm amazed at things like induction heads. I'm amazed at things like, you know, that we can, you know, use sparse autoencoders to find these directions within the networks and that the directions correspond to these very clear concepts, right? We demonstrated this a bit with the Golden Gate Bridge quad.
So this was an experiment where we found a direction inside one of the neural network's layers that corresponded to the Golden Gate Bridge. And we just turned that way up. And so we released this model as a demo. It was kind of half a joke for a couple of days, but it was illustrative of the method we developed.
And you could take the Golden Gate, you could take the model, you could ask it about anything, you know, it would be like, you could say, how was your day? And anything you asked, because this feature was activated, it would connect to the Golden Gate Bridge. So it would say, you know, I'm feeling relaxed and expansive, much like the arches of the Golden Gate Bridge, or, you know.
Somehow these interventions on the model where you kind of adjust its behavior somehow emotionally made it seem more human than any other version.
version of the model strong personality strong strong personality it has these kind of like obsessive interests you know we can all think of someone who's like obsessed with something so it does make it feel somehow a bit more human let's talk about the present let's talk about Claude so this year Claude
Yeah. So let's go back to March when we first released these three models. So our thinking was different companies produce kind of large and small models, better and worse models.
We felt that there was demand both for a really powerful model, you know, that might be a little bit slower that you'd have to pay more for, and also for fast, cheap models that are as smart as they can be for how fast and cheap, right?
Whenever you want to do some kind of like, you know, difficult analysis, like if I, you know, I want to write code, for instance, or, you know, I want to brainstorm ideas or I want to do creative writing, I want the really powerful model. But then there's a lot of practical applications in a business sense where it's like, I'm interacting with a website.
I'm doing my taxes or I'm talking to a legal advisor and I want to analyze a contract. Or we have plenty of companies that are just like, I want to do autocomplete on my IDE or something. And for all of those things, you want to act fast and you want to use the model very broadly. So we wanted to serve... that whole spectrum of needs.
Um, so we ended up with this, uh, you know, this kind of poetry theme. And so what's a really short poem. It's a haiku. And so haiku is the small, fast, cheap model that is, you know, was at the time was really surprisingly, surprisingly, uh, intelligent for how fast and cheap it was. Uh, Sonnet is a medium-sized poem, right? A couple paragraphs. And so sonnet was the middle model.
It is smarter, but also a little bit slower, a little bit more expensive. And opus, like a magnum opus is a large work, opus was the largest, smartest model at the time. So that was the original kind of thinking behind it. Yeah. And our thinking then was, well, each new generation of models should shift that trade-off curve.
So when we released Sonnet 3.5, it has the same, roughly the same, you know, cost and speed as the Sonnet 3 model. Uh, but, uh, it, it increased its intelligence to the point where it was smarter than the original Opus 3 model, uh, especially for code, but, but also just in general.
And so now, you know, we've shown results for a Haiku 3.5 and I believe Haiku 3.5, the smallest new model is about as good as Opus 3, the largest old model. So basically, the aim here is to shift the curve, and then at some point, there's going to be an Opus 3.5. Now, every new generation of models has its own thing. They use new data.
Their personality changes in ways that we kind of try to steer but are not fully able to steer. And so there's never quite that exact equivalence where the only thing you're changing is intelligence. We always try and improve other things, and some things change without us knowing or measuring. So it's very much an inexact science.
In many ways, the manner and personality of these models is more an art than it is a science.
Yeah, so there's different processes. There's pre-training, which is, you know, just kind of the normal language model training. And that takes a very long time. That uses, you know, these days, you know,
tens, you know, tens of thousands, sometimes many tens of thousands of, uh, GPUs or TPUs or tranium, or, you know, what we use different platforms, but, you know, accelerator chips, um, often, often training for months.
Uh, there's then a kind of post-training phase where we do reinforcement learning from human feedback, as well as other kinds of reinforcement learning that, that phase is getting, uh, larger and larger now. And, you know, Often, that's less of an exact science. It often takes effort to get it right.