Chris Olah
๐ค SpeakerAppearances Over Time
Podcast Appearances
But yeah, as the systems get more capable, this stuff gets more stressful because right now they're like not smart enough to pose any issues. But I think over time it's going to feel like possibly bad stress over time.
But yeah, as the systems get more capable, this stuff gets more stressful because right now they're like not smart enough to pose any issues. But I think over time it's going to feel like possibly bad stress over time.
I think I use that partly. And then obviously we have like, so people can send us feedback, both positive and negative about things that the model has done. And then we can get a sense of like areas where it's like falling short. Internally, people like work with the models a lot and try to figure out areas where there are like gaps.
I think I use that partly. And then obviously we have like, so people can send us feedback, both positive and negative about things that the model has done. And then we can get a sense of like areas where it's like falling short. Internally, people like work with the models a lot and try to figure out areas where there are like gaps.
I think I use that partly. And then obviously we have like, so people can send us feedback, both positive and negative about things that the model has done. And then we can get a sense of like areas where it's like falling short. Internally, people like work with the models a lot and try to figure out areas where there are like gaps.
And so I think it's this mix of interacting with it myself, seeing people internally interact with it, and then explicit feedback we get. And then I find it hard to know also, like, you know, if people are on the internet and they say something about Claude and I see it, I'll also take that seriously. I don't know.
And so I think it's this mix of interacting with it myself, seeing people internally interact with it, and then explicit feedback we get. And then I find it hard to know also, like, you know, if people are on the internet and they say something about Claude and I see it, I'll also take that seriously. I don't know.
And so I think it's this mix of interacting with it myself, seeing people internally interact with it, and then explicit feedback we get. And then I find it hard to know also, like, you know, if people are on the internet and they say something about Claude and I see it, I'll also take that seriously. I don't know.
I'm pretty sympathetic in that they are in this difficult position where I think that they have to judge whether some things actually seem risky or bad and potentially harmful to you or anything like that. So they're having to like draw this line somewhere.
I'm pretty sympathetic in that they are in this difficult position where I think that they have to judge whether some things actually seem risky or bad and potentially harmful to you or anything like that. So they're having to like draw this line somewhere.
I'm pretty sympathetic in that they are in this difficult position where I think that they have to judge whether some things actually seem risky or bad and potentially harmful to you or anything like that. So they're having to like draw this line somewhere.
And if they draw it too much in the direction of like, I'm going to, you know, I'm kind of like imposing my ethical worldview on you, that seems bad. So in many ways, like I like to think that we have actually seen improvements on this across the board, which is kind of interesting because that kind of coincides with like, For example, like adding more of like character training.
And if they draw it too much in the direction of like, I'm going to, you know, I'm kind of like imposing my ethical worldview on you, that seems bad. So in many ways, like I like to think that we have actually seen improvements on this across the board, which is kind of interesting because that kind of coincides with like, For example, like adding more of like character training.
And if they draw it too much in the direction of like, I'm going to, you know, I'm kind of like imposing my ethical worldview on you, that seems bad. So in many ways, like I like to think that we have actually seen improvements on this across the board, which is kind of interesting because that kind of coincides with like, For example, like adding more of like character training.
And I think my hypothesis was always like the good character isn't again one that's just like moralistic. It's one that is like like it respects you and your autonomy and your ability to like choose what is good for you and what is right for you. Within limits, this is sometimes this concept of like courageability to the user. So just being willing to do anything that the user asks.
And I think my hypothesis was always like the good character isn't again one that's just like moralistic. It's one that is like like it respects you and your autonomy and your ability to like choose what is good for you and what is right for you. Within limits, this is sometimes this concept of like courageability to the user. So just being willing to do anything that the user asks.
And I think my hypothesis was always like the good character isn't again one that's just like moralistic. It's one that is like like it respects you and your autonomy and your ability to like choose what is good for you and what is right for you. Within limits, this is sometimes this concept of like courageability to the user. So just being willing to do anything that the user asks.
And if the models were willing to do that, then they would be easily like misused. You're kind of just trusting. At that point, you're just saying the ethics of the model and what it does is completely the ethics of the user.
And if the models were willing to do that, then they would be easily like misused. You're kind of just trusting. At that point, you're just saying the ethics of the model and what it does is completely the ethics of the user.
And if the models were willing to do that, then they would be easily like misused. You're kind of just trusting. At that point, you're just saying the ethics of the model and what it does is completely the ethics of the user.