John Schulman
๐ค SpeakerAppearances Over Time
Podcast Appearances
So I think that same kind of ability could be used for something like active learning.
Well, I think policy gradient algorithms are not the most sample efficient algorithms.
So that's probably not what you want to do at test time if you want to learn really fast.
But who knows?
I mean, maybe it's not that bad.
So I think
Something like motor learning in animals is probably something like a policy grading algorithm.
For example, you're learning how to shoot baskets.
That takes maybe thousands of tries to get more accurate.
There's probably something that's like a policy grading algorithm underneath.
But that's not gonna be the fastest way to learn in like if you have a model trying to do a project or some kind of task.
So I would think we would wanna rely more on like in context learning where,
you effectively have a learned algorithm.
Like you've learned how to explore, like you've learned how to try all the possibilities exhaustively.
And instead of doing the same thing over and over again, making the same mistake.
So yeah, I would say we'll be able to do things that look more like learned search algorithms.
And that'll be the kind of thing that gets used in a particular task.
Interesting.
Yeah, so early, so we had, before ChatGPT, we had, OpenAI had these instruction following models and that was, the idea there was we had base models and people can prompt them in elaborate ways, but they're also kind of hard to prompt.
You had to, they basically do auto-complete.