Dario Amodei
๐ค SpeakerAppearances Over Time
Podcast Appearances
But then in some other situation, they end up being like super rude or like overconfident in a way that's like misleading people. So there are all these trade-offs, right? For example, another thing is if there was a period during which models, ours and I think others as well, were too verbose, right? They would like repeat themselves. They would say too much.
You can cut down on the verbosity by penalizing the models for just talking for too long. What happens when you do that, if you do it in a crude way, is when the models are coding, sometimes they'll say, rest of the code goes here, right? Because they've learned that that's a way to economize and that they see it.
You can cut down on the verbosity by penalizing the models for just talking for too long. What happens when you do that, if you do it in a crude way, is when the models are coding, sometimes they'll say, rest of the code goes here, right? Because they've learned that that's a way to economize and that they see it.
You can cut down on the verbosity by penalizing the models for just talking for too long. What happens when you do that, if you do it in a crude way, is when the models are coding, sometimes they'll say, rest of the code goes here, right? Because they've learned that that's a way to economize and that they see it.
So that leads the model to be so-called lazy in coding, where they're just like, ah, you can finish the rest of it. It's not because we want to save on compute or because the models are lazy during winter break or any of the other kind of conspiracy theories that have come up.
So that leads the model to be so-called lazy in coding, where they're just like, ah, you can finish the rest of it. It's not because we want to save on compute or because the models are lazy during winter break or any of the other kind of conspiracy theories that have come up.
So that leads the model to be so-called lazy in coding, where they're just like, ah, you can finish the rest of it. It's not because we want to save on compute or because the models are lazy during winter break or any of the other kind of conspiracy theories that have come up.
It's actually โ it's just very hard to control the behavior of the model, to steer the behavior of the model in all circumstances at once. You can kind of โ there's this whack-a-mole aspect where you push on one thing and like these โ these other things start to move as well that you may not even notice or measure.
It's actually โ it's just very hard to control the behavior of the model, to steer the behavior of the model in all circumstances at once. You can kind of โ there's this whack-a-mole aspect where you push on one thing and like these โ these other things start to move as well that you may not even notice or measure.
It's actually โ it's just very hard to control the behavior of the model, to steer the behavior of the model in all circumstances at once. You can kind of โ there's this whack-a-mole aspect where you push on one thing and like these โ these other things start to move as well that you may not even notice or measure.
And so one of the reasons that I, that I care so much about, uh, you know, kind of grand alignment of these AI systems in the future is actually, these systems are actually quite unpredictable. They're actually quite hard to steer and control. Um, and this version we're seeing today of you make one thing better. It makes another thing worse. Uh,
And so one of the reasons that I, that I care so much about, uh, you know, kind of grand alignment of these AI systems in the future is actually, these systems are actually quite unpredictable. They're actually quite hard to steer and control. Um, and this version we're seeing today of you make one thing better. It makes another thing worse. Uh,
And so one of the reasons that I, that I care so much about, uh, you know, kind of grand alignment of these AI systems in the future is actually, these systems are actually quite unpredictable. They're actually quite hard to steer and control. Um, and this version we're seeing today of you make one thing better. It makes another thing worse. Uh,
I think that's, that's like a present day analog of future control problems in AI systems that we can start to study today. Right. I think, I think that, that, that difficulty in, in steering the behavior and in making sure that if we push an AI system in one direction, it doesn't push it in another direction in some, in some other ways that we didn't want. Uh,
I think that's, that's like a present day analog of future control problems in AI systems that we can start to study today. Right. I think, I think that, that, that difficulty in, in steering the behavior and in making sure that if we push an AI system in one direction, it doesn't push it in another direction in some, in some other ways that we didn't want. Uh,
I think that's, that's like a present day analog of future control problems in AI systems that we can start to study today. Right. I think, I think that, that, that difficulty in, in steering the behavior and in making sure that if we push an AI system in one direction, it doesn't push it in another direction in some, in some other ways that we didn't want. Uh,
I think that's, that's kind of an, that's kind of an early sign of things to come. And if we can do a good job of solving this problem, right. Of like, you ask the model to like, you know, to like make and distribute smallpox and it says no, but it's willing to like help you in your graduate level virology class. Like how do we get both of those things at once? It's hard.
I think that's, that's kind of an, that's kind of an early sign of things to come. And if we can do a good job of solving this problem, right. Of like, you ask the model to like, you know, to like make and distribute smallpox and it says no, but it's willing to like help you in your graduate level virology class. Like how do we get both of those things at once? It's hard.
I think that's, that's kind of an, that's kind of an early sign of things to come. And if we can do a good job of solving this problem, right. Of like, you ask the model to like, you know, to like make and distribute smallpox and it says no, but it's willing to like help you in your graduate level virology class. Like how do we get both of those things at once? It's hard.
It's very easy to go to one side or the other. And it's a multidimensional problem. And so, uh, I think these questions of shaping the model's personality, I think they're very hard. I think we haven't done perfectly on them. I think we've actually done the best of all the AI companies, but still so far from perfect.