Dario Amodei
๐ค SpeakerAppearances Over Time
Podcast Appearances
You know, if if other companies are doing this that look more responsible, they want to look more responsible, too. No one wants to look like the irresponsible actor. And so they adopt this. They adopt this as well. When folks come to Anthropic, interpretability is often a draw. And I tell them the other places you didn't go. Tell them why you came here. And then you.
You know, if if other companies are doing this that look more responsible, they want to look more responsible, too. No one wants to look like the irresponsible actor. And so they adopt this. They adopt this as well. When folks come to Anthropic, interpretability is often a draw. And I tell them the other places you didn't go. Tell them why you came here. And then you.
You see soon that there's interpretability teams elsewhere as well. And in a way, that takes away our competitive advantage because it's like, oh, now others are doing it as well, but it's good for the broader system. And so we have to invent some new thing that we're doing that others aren't doing as well in the hope is to basically bid up the importance of doing the right thing.
You see soon that there's interpretability teams elsewhere as well. And in a way, that takes away our competitive advantage because it's like, oh, now others are doing it as well, but it's good for the broader system. And so we have to invent some new thing that we're doing that others aren't doing as well in the hope is to basically bid up the importance of doing the right thing.
You see soon that there's interpretability teams elsewhere as well. And in a way, that takes away our competitive advantage because it's like, oh, now others are doing it as well, but it's good for the broader system. And so we have to invent some new thing that we're doing that others aren't doing as well in the hope is to basically bid up the importance of doing the right thing.
And it's not about us in particular, right? It's not about having one particular good guy. Other companies can do this as well. If they join the race to do this, that's the best news ever, right? It's about kind of shaping the incentives to point upward instead of shaping the incentives to point downward.
And it's not about us in particular, right? It's not about having one particular good guy. Other companies can do this as well. If they join the race to do this, that's the best news ever, right? It's about kind of shaping the incentives to point upward instead of shaping the incentives to point downward.
And it's not about us in particular, right? It's not about having one particular good guy. Other companies can do this as well. If they join the race to do this, that's the best news ever, right? It's about kind of shaping the incentives to point upward instead of shaping the incentives to point downward.
Trying to. I mean, I think we're still early in terms of our ability to see things, but I've been surprised at how much we've been able to look inside these systems and understand what we see, right? Unlike with the scaling laws, where it feels like there's some law that's deriving these models to perform better,
Trying to. I mean, I think we're still early in terms of our ability to see things, but I've been surprised at how much we've been able to look inside these systems and understand what we see, right? Unlike with the scaling laws, where it feels like there's some law that's deriving these models to perform better,
Trying to. I mean, I think we're still early in terms of our ability to see things, but I've been surprised at how much we've been able to look inside these systems and understand what we see, right? Unlike with the scaling laws, where it feels like there's some law that's deriving these models to perform better,
On the inside, the models aren't, you know, there's no reason why they should be designed for us to understand them, right? They're designed to operate. They're designed to work, just like the human brain or human biochemistry. They're not designed for a human to open up the hatch, look inside and understand them.
On the inside, the models aren't, you know, there's no reason why they should be designed for us to understand them, right? They're designed to operate. They're designed to work, just like the human brain or human biochemistry. They're not designed for a human to open up the hatch, look inside and understand them.
On the inside, the models aren't, you know, there's no reason why they should be designed for us to understand them, right? They're designed to operate. They're designed to work, just like the human brain or human biochemistry. They're not designed for a human to open up the hatch, look inside and understand them.
But we have found, and, you know, you can talk in much more detail about this to Chris, that when we open them up, when we do look inside them, we find things that are surprisingly interesting.
But we have found, and, you know, you can talk in much more detail about this to Chris, that when we open them up, when we do look inside them, we find things that are surprisingly interesting.
But we have found, and, you know, you can talk in much more detail about this to Chris, that when we open them up, when we do look inside them, we find things that are surprisingly interesting.
I'm amazed at how clean it's been. I'm amazed at things like induction heads. I'm amazed at things like, you know, that we can, you know, use sparse autoencoders to find these directions within the networks and that the directions correspond to these very clear concepts, right? We demonstrated this a bit with the Golden Gate Bridge quad.
I'm amazed at how clean it's been. I'm amazed at things like induction heads. I'm amazed at things like, you know, that we can, you know, use sparse autoencoders to find these directions within the networks and that the directions correspond to these very clear concepts, right? We demonstrated this a bit with the Golden Gate Bridge quad.
I'm amazed at how clean it's been. I'm amazed at things like induction heads. I'm amazed at things like, you know, that we can, you know, use sparse autoencoders to find these directions within the networks and that the directions correspond to these very clear concepts, right? We demonstrated this a bit with the Golden Gate Bridge quad.