John Schulman
๐ค SpeakerAppearances Over Time
Podcast Appearances
Like I only really pivoted what I was working on and what,
what my team was working on after GPT-3.
So after that, we kind of got together and said,
this language model stuff works really well.
Let's see what we can do here.
But yeah, after GBD2, I wasn't quite sure yet.
Yeah, there are some arguments for that.
I mean, right now it's a pretty lopsided ratio, but you could argue that the output generated by the model is like high quality compared to, or higher quality than most of what's on the web.
So it sort of makes more sense for the model to think by itself instead of just like training to imitate what's on the web.
So I think there's a first principles argument for that.
I would say we found a lot of gains through post-training.
So I'm not sure.
So I would expect us to keep pushing this methodology and probably increasing the amount of compute we put into it.
Yeah, I would say that most of that is post-training.
Interesting.
So there are a lot of different separate axes for improvement.
We think about data quality, data quantity, just doing more iterations of the whole process of deploying and collecting new data and changing what you're...
what kind of annotations you're collecting.
So there's a lot of things that stack up, but together they give you a pretty good, like effective compute increase.
I'd say I just have a decent amount of experience at this point from the different parts of the stack, from RL algorithms, obviously, since I've worked on those since grad school, to the data collection, the annotation process, to...