John Schulman
👤 PersonAppearances Over Time
Podcast Appearances
In pre-training, you're basically training to imitate all of the content on the internet or on the web, including websites and code and so forth.
So you get a model that can basically generate content that looks like random web pages from the internet.
The model is also trained to maximize likelihood, where it has to put a probability on everything.
So the objective is basically predicting the next token given the previous tokens.
Tokens are like words or parts of words.
And since the model has to put a probability on it, and we're training to maximize log probability, it ends up being very calibrated.
So it can not only generate all of this,
the content of the web, it can also assign probabilities to everything.
So the base model can effectively take on all of these different personas or generate all these different kinds of content.
And then when we do post-training, we're usually targeting a narrower range of behavior where
We basically want the model to behave like this kind of chat assistant.
And it's a more specific persona where it's trying to be helpful.
It's not trying to imitate a person.
It's answering your questions or doing your tasks.
we're optimizing on a different objective, which is more about producing outputs that humans will like and find useful as opposed to just trying to imitate this raw content from the web.
Yeah.
Okay.
Oh yeah, five years.
Yeah, I think the models will get quite a bit better in the course of five years.
So, I mean, I think even in one or two years, we'll find that a lot of, you can use them for a lot of more like involved tasks than they can do now.