Reiner Pope
๐ค SpeakerAppearances Over Time
Podcast Appearances
It's a little bit nicer actually.
So data in pre-training plus this, oh, I didn't have the inefficiency over here either.
Inefficiency data in pre-training plus some multiple of like alpha times the data in RL
is just going to end up equal to the sum of beta times the data in inference.
And then let's just roughly size the alpha.
This alpha, it's going to be...
Uh, this is like the, it's maybe somewhere in the range of two to six, uh, two to six over six, um, from this term compared to this term.
Um, and then we've got an inefficiency term, which, uh, I would say is maybe in the range of like 30%, something like that.
Um, so, uh, so, so this alpha is going to be something like, um, one on 10, one over 10, let's say.
And this beta here is actually the same.
It's a third.
It's one third times 30%.
So it's also equals 1 in 10.
Something like that.
Yeah, okay.
We can make this like 2 in 10.
Make it a bit bigger.
So, yeah, just write it out once more.
This is 2 over 10.
This is 1 over 10.