John Schulman
👤 PersonAppearances Over Time
Podcast Appearances
So, I mean, if you have...
Yeah, if you collect a diverse data set, you're going to get a little bit of everything in it.
And if you have models that generalize really well, even if there's just a couple examples of getting back on track.
I see.
Okay, interesting.
Or even, like, maybe in the pre-training, there's examples of getting back on track.
Then, like, the model will be able to generalize from those other things it's seen to the current situation.
So I think, like...
uh if you have models that are weaker you might be able to get them to do almost anything with enough data but you might have to put a lot of effort into a particular domain or skill whereas for a stronger model it might just do the right thing without any training data or any effort
Yeah, I would say at a high level, I would agree that longer horizon tasks are going to require more model intelligence to do well and are going to be more expensive to train for.
I'm not sure I would expect there to be a really clean scaling law unless you set it up in a very careful way or design the experiment in a certain way.
Because
I would say there might end up being some phase transitions where once you get to a certain level, you can deal with much longer tasks.
So for example, people,
I think when people do planning at different timescales, I'm not sure they use completely different mechanisms.
So we probably use the same mental machinery if we're thinking about one month from now, one year from now, or like a hundred years from now.
So we're not actually doing some kind of reinforcement learning where we need to worry about a discount factor that covers that timescale and so forth.
So I think using language, you can describe all of these different timescales, and then you can do things like plan.
In the moment, you can try to make progress towards your goal, whether it's a month away or 10 years away.
So I might expect the same out of models where