John Schulman
๐ค SpeakerAppearances Over Time
Podcast Appearances
So I think if you publish methods that are like really hard to implement or are really finicky, they'll tend to get forgotten.
And as a result, people actually try to open source their work a lot.
I guess there's also, there's various like incentives that,
There's various unfavorable incentives.
Yeah, people are incentivized to make the baseline methods, the methods they're comparing to, worse.
There are other mild pathologies, like trying to make your method seem sophisticated mathematically.
But I would say overall, I feel like the field makes progress.
I would probably like to see a little bit more science and trying to understand things rather than more like hill climbing on benchmarks and trying to propose new methods.
And there's been a decent amount of that recently, but yeah, I think we could use more of that.
And I think that's a good thing for like academics to work on.
Oh yeah, on the social sciences, on a slightly different note, I think actually,
I'd be really excited to see more research using base models to do simulated social science because these models have a probabilistic model of the whole world and you can set up like a simulated questionnaire or like a conversation and
And you can look at how anything is correlated, like any traits that you might imagine, you can see how they might be correlated with other traits.
So it'd be pretty cool to see if people could replicate some of the more notable results in the social sciences, like moral foundations and that sort of thing, by just prompting base models in different ways and seeing what's correlated.
What is that Stanford experiment?
Yeah, well, definitely there's always progress in improving the efficiency.
Whenever you have a 1D performance metric, you're going to find that different improvements can kind of substitute for each other.
So you might find that post-training and training
pre-training both improve the metrics or, uh, like improve, uh, they, they, they'll have a different, slightly different profile of which metrics they improve.
But, uh, if, if at the end of the day, you have a single number, they're both gonna, they're gonna substitute for each other, uh, somewhat.