Dario Amodei
๐ค SpeakerAppearances Over Time
Podcast Appearances
It's very easy to go to one side or the other. And it's a multidimensional problem. And so, uh, I think these questions of shaping the model's personality, I think they're very hard. I think we haven't done perfectly on them. I think we've actually done the best of all the AI companies, but still so far from perfect.
It's very easy to go to one side or the other. And it's a multidimensional problem. And so, uh, I think these questions of shaping the model's personality, I think they're very hard. I think we haven't done perfectly on them. I think we've actually done the best of all the AI companies, but still so far from perfect.
And I think if we can get this right, if we can control the false positives and false negatives in this very kind of controlled present day environment will be much better at doing it for the future when our worry is, will the models be super autonomous? Will they be able to make very dangerous things? Will they be able to autonomously build whole companies and are those companies aligned?
And I think if we can get this right, if we can control the false positives and false negatives in this very kind of controlled present day environment will be much better at doing it for the future when our worry is, will the models be super autonomous? Will they be able to make very dangerous things? Will they be able to autonomously build whole companies and are those companies aligned?
And I think if we can get this right, if we can control the false positives and false negatives in this very kind of controlled present day environment will be much better at doing it for the future when our worry is, will the models be super autonomous? Will they be able to make very dangerous things? Will they be able to autonomously build whole companies and are those companies aligned?
So I think of this present task as both vexing, but also good practice for the future.
So I think of this present task as both vexing, but also good practice for the future.
So I think of this present task as both vexing, but also good practice for the future.
So typically we'll have internal model bashings where all of Anthropic, Anthropic is almost a thousand people. You know, people just try and break the model. They try and interact with it various ways. Um, uh, we have a suite of evals, uh, for, you know, oh, is the model refusing in ways that it couldn't?
So typically we'll have internal model bashings where all of Anthropic, Anthropic is almost a thousand people. You know, people just try and break the model. They try and interact with it various ways. Um, uh, we have a suite of evals, uh, for, you know, oh, is the model refusing in ways that it couldn't?
So typically we'll have internal model bashings where all of Anthropic, Anthropic is almost a thousand people. You know, people just try and break the model. They try and interact with it various ways. Um, uh, we have a suite of evals, uh, for, you know, oh, is the model refusing in ways that it couldn't?
I think we even had a certainly eval because, you know, our, our model, again, one point model had this problem where like it had this annoying tick where it would like respond to a wide range of questions by saying, certainly I can help you with that. Certainly. I would be happy to do that. Certainly this is correct.
I think we even had a certainly eval because, you know, our, our model, again, one point model had this problem where like it had this annoying tick where it would like respond to a wide range of questions by saying, certainly I can help you with that. Certainly. I would be happy to do that. Certainly this is correct.
I think we even had a certainly eval because, you know, our, our model, again, one point model had this problem where like it had this annoying tick where it would like respond to a wide range of questions by saying, certainly I can help you with that. Certainly. I would be happy to do that. Certainly this is correct.
Um, uh, and so we had a, like, certainly eval, which is like, how, how often does the model say certainly? Yeah. Uh, uh, but, but look, this is just a whack-a-mole. Like, like what if it switches from certainly to definitely like, uh, uh, so, you know, every time we add a new eval and we're, we're always evaluating for all the old things.
Um, uh, and so we had a, like, certainly eval, which is like, how, how often does the model say certainly? Yeah. Uh, uh, but, but look, this is just a whack-a-mole. Like, like what if it switches from certainly to definitely like, uh, uh, so, you know, every time we add a new eval and we're, we're always evaluating for all the old things.
Um, uh, and so we had a, like, certainly eval, which is like, how, how often does the model say certainly? Yeah. Uh, uh, but, but look, this is just a whack-a-mole. Like, like what if it switches from certainly to definitely like, uh, uh, so, you know, every time we add a new eval and we're, we're always evaluating for all the old things.
So we have hundreds of these evaluations, but we find that there's no substitute for human interacting with it. And so it's very much like the ordinary product development process. We have like hundreds of people within Anthropic bash the model. Then we do, you know, then we do external AB tests. Sometimes we'll run tests with contractors. We pay contractors to interact with the model.
So we have hundreds of these evaluations, but we find that there's no substitute for human interacting with it. And so it's very much like the ordinary product development process. We have like hundreds of people within Anthropic bash the model. Then we do, you know, then we do external AB tests. Sometimes we'll run tests with contractors. We pay contractors to interact with the model.
So we have hundreds of these evaluations, but we find that there's no substitute for human interacting with it. And so it's very much like the ordinary product development process. We have like hundreds of people within Anthropic bash the model. Then we do, you know, then we do external AB tests. Sometimes we'll run tests with contractors. We pay contractors to interact with the model.