Chris Olah
๐ค SpeakerAppearances Over Time
Podcast Appearances
It's also funny if people think that about the Slack channel, because I'm like, that's one of like five or six different methods that I have for talking with Claude. And I'm like, yes, there's a tiny percentage of how much I talk with Claude.
It's also funny if people think that about the Slack channel, because I'm like, that's one of like five or six different methods that I have for talking with Claude. And I'm like, yes, there's a tiny percentage of how much I talk with Claude.
It's also funny if people think that about the Slack channel, because I'm like, that's one of like five or six different methods that I have for talking with Claude. And I'm like, yes, there's a tiny percentage of how much I talk with Claude.
I think the goal... One thing I really like about the character work is from the outset it was seen as an alignment piece of work and not something like a product consideration. Which isn't to say I don't think it makes Claude... I think it actually does make Claude enjoyable to talk with. At least I hope so. But I guess my...
I think the goal... One thing I really like about the character work is from the outset it was seen as an alignment piece of work and not something like a product consideration. Which isn't to say I don't think it makes Claude... I think it actually does make Claude enjoyable to talk with. At least I hope so. But I guess my...
I think the goal... One thing I really like about the character work is from the outset it was seen as an alignment piece of work and not something like a product consideration. Which isn't to say I don't think it makes Claude... I think it actually does make Claude enjoyable to talk with. At least I hope so. But I guess my...
main thought with it has always been trying to get Claude to behave the way you would kind of ideally want anyone to behave if they were in Claude's position. So imagine that I take someone and they know that they're going to be talking with potentially millions of people so that what they're saying can have a huge impact. And you want them to behave well in this like really rich sense.
main thought with it has always been trying to get Claude to behave the way you would kind of ideally want anyone to behave if they were in Claude's position. So imagine that I take someone and they know that they're going to be talking with potentially millions of people so that what they're saying can have a huge impact. And you want them to behave well in this like really rich sense.
main thought with it has always been trying to get Claude to behave the way you would kind of ideally want anyone to behave if they were in Claude's position. So imagine that I take someone and they know that they're going to be talking with potentially millions of people so that what they're saying can have a huge impact. And you want them to behave well in this like really rich sense.
So I think that doesn't just mean like being, say, ethical, though it does include that, and not being harmful, but also being kind of nuanced, you know, like thinking through what a person means, trying to be charitable with them, being a good conversationalist, like really in this kind of like rich sort of Aristotelian notion of what it is to be a good person and not in this kind of like thin, like ethics as a more comprehensive notion of what it is to be.
So I think that doesn't just mean like being, say, ethical, though it does include that, and not being harmful, but also being kind of nuanced, you know, like thinking through what a person means, trying to be charitable with them, being a good conversationalist, like really in this kind of like rich sort of Aristotelian notion of what it is to be a good person and not in this kind of like thin, like ethics as a more comprehensive notion of what it is to be.
So I think that doesn't just mean like being, say, ethical, though it does include that, and not being harmful, but also being kind of nuanced, you know, like thinking through what a person means, trying to be charitable with them, being a good conversationalist, like really in this kind of like rich sort of Aristotelian notion of what it is to be a good person and not in this kind of like thin, like ethics as a more comprehensive notion of what it is to be.
So that includes things like when should you be humorous? When should you be caring? How much should you respect autonomy and people's ability to form opinions themselves? And how should you do that? I think that's the kind of rich sense of character that I wanted to and still do want Claude to have.
So that includes things like when should you be humorous? When should you be caring? How much should you respect autonomy and people's ability to form opinions themselves? And how should you do that? I think that's the kind of rich sense of character that I wanted to and still do want Claude to have.
So that includes things like when should you be humorous? When should you be caring? How much should you respect autonomy and people's ability to form opinions themselves? And how should you do that? I think that's the kind of rich sense of character that I wanted to and still do want Claude to have.
Yeah, there's this problem of like sycophancy in language models.
Yeah, there's this problem of like sycophancy in language models.
Yeah, there's this problem of like sycophancy in language models.
Yeah, so basically there's a concern that the model sort of wants to tell you what you want to hear, basically. And you see this sometimes. So I feel like if you interact with the model's
Yeah, so basically there's a concern that the model sort of wants to tell you what you want to hear, basically. And you see this sometimes. So I feel like if you interact with the model's