Dwarkesh Patel
๐ค SpeakerAppearances Over Time
Podcast Appearances
Now, GPT-3 already demonstrated in-context learning could be very powerful in 2020.
Its in-context learning capabilities were so remarkable, the title of the GPT-3 paper was Language Models Are Few-Shot Learners.
But of course, we didn't solve in-context learning when GPT-3 came out.
And indeed, there's still plenty of progress that still has to be made, from comprehension to context length.
I expect a similar progression with continual learning.
Labs will probably release something next year which they call continual learning, and which will in fact count as progress towards continual learning.
But human-level, on-the-job learning may take another five to ten years to iron out.
This is why I don't expect some kind of runaway gains from the first model that cracks continual learning, that's getting more and more widely deployed and capable.
If you had fully solved continual learning drop out of nowhere, then sure, it might be game set match as Satya put it on the podcast when I asked him about this possibility.
But that's probably not what's going to happen.
Instead, some lab is going to figure out how to get some initial traction on this problem.
And then playing around with this feature will make it clear how it was implemented.
And then other labs will soon replicate the breakthrough and improve it slightly.
Besides, I just have some prior that the competition will stay pretty fierce between all these model companies.
And this is informed by the observation that all these previous supposed flywheels, whether that's user engagement on chat or synthetic data or whatever, have done very little to diminish the greater and greater competition between model companies.
Every month or so, the big three model companies will rotate around the podium, and the other competitors are not that far behind.
There seems to be some force, and this is potentially talent poaching, it's potentially the rumor mill in SF, or just normal reverse engineering, which has so far neutralized any runaway advantage that a single lab might have had.
I experimented with different video models to help me animate some of my essays.
But the thing is, my team and I are very opinionated about what we want the end product to look like.
And so for a video model to be useful to us, it needs to be able to follow our instructions for exactly what kind of shot and framing and lighting we want.