Nick Heiner
๐ค SpeakerAppearances Over Time
Podcast Appearances
And when they see ChatGPT asking for that same information for free, like some of them have actually complained, you know, sort of in like, I mean, not like, I mean, they're just sort of vetting.
It's not a serious thing, but yeah.
But yes, but that is exactly what they're doing is they're gathering training data.
Yeah, I guess like, you know, if you ask me to learn how to golf just by watching YouTube videos, like I think I would truly struggle to do that.
However, the reason it's taken us this long to get this far is in part because it is substantially more complicated to build a training golf course for you than it is to give you an iPad and say, here's a thousand hours of Tiger Woods.
Yeah, so reinforcement learning is a technique that's applied during post-training.
Pre-training is the step where you basically have the model read, you know, the whole internet.
It's not actually the whole internet, but you just shove a bunch of tokens through the model.
And that's what gives you a prior on language.
It's what gives you, like, knowledge.
And then post-training is what gives you behavior.
So, you know, the most, the earliest example was if you remember in 2020, GPT-3 came out and
It was not a chat tuned model like it would not have a conversation with you.
It would literally just you your prompt is a document and it just writes whatever it thinks the rest of the document is going to be.
So the way to sort of induce it into having a conversation with you might be you say Q and then write your question and then you say A and leave a spot for its response.
But half the time, what it would do is it would write your answer and then it would write a question of its own.
Because, you know, many documents are in a Q&A format or whatever.
Right, right.
The other issue with a raw pre-trained model is that you are not going to have sort of the safety standards that you want.
And so it's another thing applied during post-training.