Nick Heiner
π€ SpeakerAppearances Over Time
Podcast Appearances
And when they see ChatGPT asking for that same information for free, like some of them have actually complained, you know, sort of in like, I mean, not like, I mean, they're just sort of vetting.
It's not a serious thing, but yeah.
But yes, but that is exactly what they're doing is they're gathering training data.
Yeah, I guess like, you know, if you ask me to learn how to golf just by watching YouTube videos, like I think I would truly struggle to do that.
However, the reason it's taken us this long to get this far is in part because it is substantially more complicated to build a training golf course for you than it is to give you an iPad and say, here's a thousand hours of Tiger Woods.
Yeah, so reinforcement learning is a technique that's applied during post-training.
Pre-training is the step where you basically have the model read, you know, the whole internet.
It's not actually the whole internet, but you just shove a bunch of tokens through the model.
And that's what gives you a prior on language.
It's what gives you, like, knowledge.
And then post-training is what gives you behavior.
So, you know, the most, the earliest example was if you remember in 2020, GPT-3 came out and
It was not a chat tuned model like it would not have a conversation with you.
It would literally just you your prompt is a document and it just writes whatever it thinks the rest of the document is going to be.
So the way to sort of induce it into having a conversation with you might be you say Q and then write your question and then you say A and leave a spot for its response.
But half the time, what it would do is it would write your answer and then it would write a question of its own.
Because, you know, many documents are in a Q&A format or whatever.
Right, right.
The other issue with a raw pre-trained model is that you are not going to have sort of the safety standards that you want.
And so it's another thing applied during post-training.