Sholto Douglas
๐ค SpeakerAppearances Over Time
Podcast Appearances
Because you use more compute and as you train on more and more difficult tasks.
I mean, I don't know, your rate of improvement of biology is going to be somewhat bound by the time it takes a cell to grow in a way that your rate of improvement on math isn't, for example.
So, yes.
But I think for many things we'll be able to parallelize widely enough and get enough iteration loops.
That depends on whether or not you think there's a virtue in pre-training a new architecture.
Basically, if you make some architectural change, then you probably need to do some form of at least retraining a new model.
But there's a trade-off equation here, right?
There is science to do, which everyone is doing, of what is the optimal point at which to do RL.
Because you need something which can both learn and discover the sparse reward itself.
So you don't want a one parameter model.
Useless, even though you can run it really fast.
You also don't want a 100T model because it's super slow.
MARK MANDELMANN- Yeah, password RL.
And the marginal benefit of its learning efficiency is not worth it.
So there's a pretty big frontier here.
What's the optimal model size of your current class of capabilities and your current set of RL environments and this kind of stuff?
Yeah, my total pool of compute, how do I allocate that across train data compute and inference compute for the RL training?
Yeah.
So I think, once again, it's worth considering the spectrum of possible worlds and preparing yourself for that.
And the sort of action that I think is highest EV in that case is you are about to get dramatic, at a minimum, you are about to get dramatically more leverage.