Chris Olah
๐ค SpeakerAppearances Over Time
Podcast Appearances
What could I have said that would make you not make that error? Write that out as an instruction. And I'm going to give it to model. I'm going to try it. Sometimes I do that. I give that to the model in another context window often. I take the response. I give it to Claude and I'm like, hmm, didn't work. Can you think of anything else? You can play around with these things quite a lot.
What could I have said that would make you not make that error? Write that out as an instruction. And I'm going to give it to model. I'm going to try it. Sometimes I do that. I give that to the model in another context window often. I take the response. I give it to Claude and I'm like, hmm, didn't work. Can you think of anything else? You can play around with these things quite a lot.
What could I have said that would make you not make that error? Write that out as an instruction. And I'm going to give it to model. I'm going to try it. Sometimes I do that. I give that to the model in another context window often. I take the response. I give it to Claude and I'm like, hmm, didn't work. Can you think of anything else? You can play around with these things quite a lot.
I think there's just a huge amount of information in the data that humans provide, like when we provide preferences, especially because different people are going to pick up on really subtle and small things. So I've thought about this before, where you probably have some people who just really care about good grammar use for models, like, you know, was a semicolon used correctly or something?
I think there's just a huge amount of information in the data that humans provide, like when we provide preferences, especially because different people are going to pick up on really subtle and small things. So I've thought about this before, where you probably have some people who just really care about good grammar use for models, like, you know, was a semicolon used correctly or something?
I think there's just a huge amount of information in the data that humans provide, like when we provide preferences, especially because different people are going to pick up on really subtle and small things. So I've thought about this before, where you probably have some people who just really care about good grammar use for models, like, you know, was a semicolon used correctly or something?
And so you'll probably end up with a bunch of data in there that you as a human, if you're looking at that data, you wouldn't even see that. You'd be like, why did they prefer this response to that one? I don't get it. And then the reason is you don't care about semicolon usage, but that person does.
And so you'll probably end up with a bunch of data in there that you as a human, if you're looking at that data, you wouldn't even see that. You'd be like, why did they prefer this response to that one? I don't get it. And then the reason is you don't care about semicolon usage, but that person does.
And so you'll probably end up with a bunch of data in there that you as a human, if you're looking at that data, you wouldn't even see that. You'd be like, why did they prefer this response to that one? I don't get it. And then the reason is you don't care about semicolon usage, but that person does.
And so each of these single data points has, and this model just has so many of those, it has to try and figure out what is it that humans want in this really kind of complex, across all domains model. They're going to be seeing this across many contexts. It feels like the classic issue of deep learning, where historically we've tried to do edge detection by mapping things out.
And so each of these single data points has, and this model just has so many of those, it has to try and figure out what is it that humans want in this really kind of complex, across all domains model. They're going to be seeing this across many contexts. It feels like the classic issue of deep learning, where historically we've tried to do edge detection by mapping things out.
And so each of these single data points has, and this model just has so many of those, it has to try and figure out what is it that humans want in this really kind of complex, across all domains model. They're going to be seeing this across many contexts. It feels like the classic issue of deep learning, where historically we've tried to do edge detection by mapping things out.
And it turns out that actually, if you just have a huge amount of data that actually accurately represents the picture of the thing that you're trying to train the model to learn, that's more powerful than anything else. And so I think...
And it turns out that actually, if you just have a huge amount of data that actually accurately represents the picture of the thing that you're trying to train the model to learn, that's more powerful than anything else. And so I think...
And it turns out that actually, if you just have a huge amount of data that actually accurately represents the picture of the thing that you're trying to train the model to learn, that's more powerful than anything else. And so I think...
One reason is just that you are training the model on exactly the task and with a lot of data that represents many different angles on which people prefer and disprefer responses. I think there is a question of are you eliciting things from pre-trained models or are you teaching new things to models? And in principle, you can teach new things to models in post-training.
One reason is just that you are training the model on exactly the task and with a lot of data that represents many different angles on which people prefer and disprefer responses. I think there is a question of are you eliciting things from pre-trained models or are you teaching new things to models? And in principle, you can teach new things to models in post-training.
One reason is just that you are training the model on exactly the task and with a lot of data that represents many different angles on which people prefer and disprefer responses. I think there is a question of are you eliciting things from pre-trained models or are you teaching new things to models? And in principle, you can teach new things to models in post-training.
I do think a lot of it is eliciting powerful pre-trained models. So people are probably divided on this because obviously in principle, you can definitely teach new things. I think for the most part, for a lot of the capabilities that we... most use and care about.
I do think a lot of it is eliciting powerful pre-trained models. So people are probably divided on this because obviously in principle, you can definitely teach new things. I think for the most part, for a lot of the capabilities that we... most use and care about.