Keri Briske
👤 SpeakerAppearances Over Time
Podcast Appearances
Take me there.
Yeah, it's a lot of the post-training.
It's a lot of this reinforcement learning.
Again, for when you do reinforcement learning, you take a question or a query, and then actually what we do is we synthetically generate at least 16 different variations of how you might ask the same question.
And for each of those variations, you run that through a gym.
And so this gym is a bunch of different environments.
And so you might have a math gym of if it's a math question, how might I answer that?
What's the algorithm I might do?
If it's a tool calling gyms, what tools do I need?
How would I think about this and how would I go get that?
But you can imagine there's all kinds of different environments.
And if you really kind of look out in the community, it's really great because all these gym environments are starting to pop up in the community.
So you do this post-training just to give it this breadth of knowledge.
We also have, for lack of a better benchmark, but there's these benchmarks out in the community.
So you want to make sure that you're at least meeting that minimum benchmark without overfitting to that.
So you at least want to check yourself, how am I doing on some math questions?
How am I doing on coding?
How am I doing on instruction following?
So you're just going to make sure that you're hitting those before you even release the smaller model.
Yeah, exactly.