Dario Amodei
๐ค SpeakerAppearances Over Time
Podcast Appearances
And often we find that models have properties that we're not aware of. The fact of the matter is that you can talk to a model 10,000 times and there are some behaviors you might not see. Just like with a human, right? I can know someone for a few months and not know that they have a certain skill or not know that there's a certain side to them. And so I think we just have to get used to this idea.
And we're always looking for better ways of testing our models to demonstrate these capabilities. And And also to decide which are the personality properties we want models to have and which we don't want to have. That itself, the normative question, is also super interesting.
And we're always looking for better ways of testing our models to demonstrate these capabilities. And And also to decide which are the personality properties we want models to have and which we don't want to have. That itself, the normative question, is also super interesting.
And we're always looking for better ways of testing our models to demonstrate these capabilities. And And also to decide which are the personality properties we want models to have and which we don't want to have. That itself, the normative question, is also super interesting.
From Reddit. Oh, boy.
From Reddit. Oh, boy.
From Reddit. Oh, boy.
So this actually doesn't apply. This isn't just about Claude. I believe I've seen these complaints for every foundation model produced by a major company. People said this about GPT-4. They said it about GPT-4 Turbo. So a couple things. One, the actual weights of the model, right, the actual brain of the model, that does not change unless we introduce a new model.
So this actually doesn't apply. This isn't just about Claude. I believe I've seen these complaints for every foundation model produced by a major company. People said this about GPT-4. They said it about GPT-4 Turbo. So a couple things. One, the actual weights of the model, right, the actual brain of the model, that does not change unless we introduce a new model.
So this actually doesn't apply. This isn't just about Claude. I believe I've seen these complaints for every foundation model produced by a major company. People said this about GPT-4. They said it about GPT-4 Turbo. So a couple things. One, the actual weights of the model, right, the actual brain of the model, that does not change unless we introduce a new model.
There are just a number of reasons why it would not make sense practically to be randomly substituting in substituting in new versions of the model. It's difficult from an inference perspective, and it's actually hard to control all the consequences of changing the weights of the model.
There are just a number of reasons why it would not make sense practically to be randomly substituting in substituting in new versions of the model. It's difficult from an inference perspective, and it's actually hard to control all the consequences of changing the weights of the model.
There are just a number of reasons why it would not make sense practically to be randomly substituting in substituting in new versions of the model. It's difficult from an inference perspective, and it's actually hard to control all the consequences of changing the weights of the model.
Let's say you wanted to fine tune the model to be like, I don't know, to like, to say certainly less, which, you know, an old version of Sonnet used to do. You actually end up changing a hundred things as well. So we have a whole process for it. And we have a whole process for
Let's say you wanted to fine tune the model to be like, I don't know, to like, to say certainly less, which, you know, an old version of Sonnet used to do. You actually end up changing a hundred things as well. So we have a whole process for it. And we have a whole process for
Let's say you wanted to fine tune the model to be like, I don't know, to like, to say certainly less, which, you know, an old version of Sonnet used to do. You actually end up changing a hundred things as well. So we have a whole process for it. And we have a whole process for
modifying the model we do a bunch of testing on it we do a bunch of um like we do a bunch of user testing and early customers so it we both have never changed the weights of the model without without telling anyone and it it wouldn't certainly in the current setup it would not make sense to do that now there are a couple things that we do occasionally do um one is sometimes we run ab tests um
modifying the model we do a bunch of testing on it we do a bunch of um like we do a bunch of user testing and early customers so it we both have never changed the weights of the model without without telling anyone and it it wouldn't certainly in the current setup it would not make sense to do that now there are a couple things that we do occasionally do um one is sometimes we run ab tests um
modifying the model we do a bunch of testing on it we do a bunch of um like we do a bunch of user testing and early customers so it we both have never changed the weights of the model without without telling anyone and it it wouldn't certainly in the current setup it would not make sense to do that now there are a couple things that we do occasionally do um one is sometimes we run ab tests um
Um, but those are typically very close to when a model is being, is being, uh, released and for a very small fraction of time. Um, so, uh, you know, like the, you know, the, the day before the new sonnet 3.5, I agree. We should have had a better name. It's clunky to refer to it. Um, there were some comments from people that like, it's got, it's got, it's gotten a lot better.