Reiner Pope
๐ค SpeakerAppearances Over Time
Podcast Appearances
And then we can add in the inference cost.
The inference cost is 2, number of active times the data in inference.
Thank you.
And then inference is just two.
Yeah.
So we're going to solve for essentially maybe a quality of all three of these terms.
That is...
ballpark where people are going to be like labs have more information on what is productive in doing more RL for example than versus doing more pre-training I don't have that information but I think a good ballpark is 30-30 like 33% split between each of them actually I'm not sure I understand the intuition for that another naive model could have been that RL plus pre-training would be 50%
Yeah, that's also a valid answer as well.
Because this is heuristic, I can't really argue for one versus the other.
They don't differ by that much.
Like 33 versus 25 is only a small factor off.
So let's pick one of them.
All equals seems simple enough.
And so we're just going to solve for equality of them.
It's pretty straightforward.
We can immediately see that the number of activated parameters totally disappears.
And so let's factor that out.
And we're going to just say that data in pre-training
I decided to do it your way.