Chris Olah
๐ค SpeakerAppearances Over Time
Podcast Appearances
And kind of like walking that line between convincing someone and just trying to like talk at them versus like drawing out their views, like listening and then offering kind of counter considerations. Yeah. And it's hard.
I think it's actually a hard line where it's like, where are you trying to convince someone versus just offering them like considerations and things for them to think about so that you're not actually like influencing them. You're just like letting them reach wherever they reach. And that's like a line that is difficult, but that's the kind of thing that language models have to try and do.
I think it's actually a hard line where it's like, where are you trying to convince someone versus just offering them like considerations and things for them to think about so that you're not actually like influencing them. You're just like letting them reach wherever they reach. And that's like a line that is difficult, but that's the kind of thing that language models have to try and do.
I think it's actually a hard line where it's like, where are you trying to convince someone versus just offering them like considerations and things for them to think about so that you're not actually like influencing them. You're just like letting them reach wherever they reach. And that's like a line that is difficult, but that's the kind of thing that language models have to try and do.
Yeah, I think that most of the time when I'm talking with Claude, I'm trying to kind of map out its behavior in part. Like obviously I'm getting like helpful outputs from the model as well. But in some ways, this is like how you get to know a system, I think, is by like probing it and then augmenting like, you know, the message that you're sending and then checking the response to that.
Yeah, I think that most of the time when I'm talking with Claude, I'm trying to kind of map out its behavior in part. Like obviously I'm getting like helpful outputs from the model as well. But in some ways, this is like how you get to know a system, I think, is by like probing it and then augmenting like, you know, the message that you're sending and then checking the response to that.
Yeah, I think that most of the time when I'm talking with Claude, I'm trying to kind of map out its behavior in part. Like obviously I'm getting like helpful outputs from the model as well. But in some ways, this is like how you get to know a system, I think, is by like probing it and then augmenting like, you know, the message that you're sending and then checking the response to that.
So in some ways it's like how I map out the model. I think that people focus a lot on these quantitative evaluations of models. And this is a thing that I've said before, but I think in the case of language models, A lot of the time, each interaction you have is actually quite high information. It's very predictive of other interactions that you'll have with the model.
So in some ways it's like how I map out the model. I think that people focus a lot on these quantitative evaluations of models. And this is a thing that I've said before, but I think in the case of language models, A lot of the time, each interaction you have is actually quite high information. It's very predictive of other interactions that you'll have with the model.
So in some ways it's like how I map out the model. I think that people focus a lot on these quantitative evaluations of models. And this is a thing that I've said before, but I think in the case of language models, A lot of the time, each interaction you have is actually quite high information. It's very predictive of other interactions that you'll have with the model.
And so I guess I'm like, if you talk with a model hundreds or thousands of times, this is almost like a huge number of really high quality data points about what the model is like. in a way that lots of very similar but lower quality conversations just aren't, or questions that are just mildly augmented and you have thousands of them might be less relevant than 100 really well-selected questions.
And so I guess I'm like, if you talk with a model hundreds or thousands of times, this is almost like a huge number of really high quality data points about what the model is like. in a way that lots of very similar but lower quality conversations just aren't, or questions that are just mildly augmented and you have thousands of them might be less relevant than 100 really well-selected questions.
And so I guess I'm like, if you talk with a model hundreds or thousands of times, this is almost like a huge number of really high quality data points about what the model is like. in a way that lots of very similar but lower quality conversations just aren't, or questions that are just mildly augmented and you have thousands of them might be less relevant than 100 really well-selected questions.
I think it's almost like everything. Because I want like a full map of the model, I'm kind of trying to do... the whole spectrum of possible interactions you could have with it. So like one thing that's interesting about Claude, and this might actually get to some interesting issues with RLHF, which is if you ask Claude for a poem,
I think it's almost like everything. Because I want like a full map of the model, I'm kind of trying to do... the whole spectrum of possible interactions you could have with it. So like one thing that's interesting about Claude, and this might actually get to some interesting issues with RLHF, which is if you ask Claude for a poem,
I think it's almost like everything. Because I want like a full map of the model, I'm kind of trying to do... the whole spectrum of possible interactions you could have with it. So like one thing that's interesting about Claude, and this might actually get to some interesting issues with RLHF, which is if you ask Claude for a poem,
I think that a lot of models, if you ask them for a poem, the poem is like fine. You know, usually it kind of like rhymes and it's, you know, so if you say like, give me a poem about the sun, it will be like, yeah, it'll just be a certain length. It'll like rhyme. It will be fairly kind of benign.
I think that a lot of models, if you ask them for a poem, the poem is like fine. You know, usually it kind of like rhymes and it's, you know, so if you say like, give me a poem about the sun, it will be like, yeah, it'll just be a certain length. It'll like rhyme. It will be fairly kind of benign.
I think that a lot of models, if you ask them for a poem, the poem is like fine. You know, usually it kind of like rhymes and it's, you know, so if you say like, give me a poem about the sun, it will be like, yeah, it'll just be a certain length. It'll like rhyme. It will be fairly kind of benign.
Um, and I've wondered before, is it the case that what you're seeing is kind of like the average, it turns out, you know, if you think about people who have to talk to a lot of people and be very charismatic, um, One of the weird things is that I'm like, well, they're kind of incentivized to have these extremely boring views. Because if you have really interesting views, you're divisive.