Gwern Branwen
๐ค SpeakerAppearances Over Time
Podcast Appearances
And I would just pay attention and notice that the world over time looked more like their world than it looked like my world, where algorithms are super important and you need like deep insight to do stuff, you know.
happening.
And then GPD-1 came out, and I was like, wow, this unsupervised sentiment neuron is just learning on its own, right?
That seemed pretty amazing.
It also was a very compute-centric view.
You just build the transformer, and the intelligence will come.
And then GPD-2 came out, and I had this holy shit moment.
You look at the prompting and the summarization, like, holy shit, do we live in their world?
And then GPT-3 comes out, and that was really the crucial test.
It was a huge, huge scale-up, one of the biggest scale-ups in all of neural network history, going from GPT-2 to GPT-3.
And it wasn't like it was a super narrow, specific task like Go.
It really seemed like it was the crucial task.
If scaling was bogus, then the GBD-3 paper should have just been totally unimpressive and wouldn't show anything that important.
Whereas if scaling were true, you would just automatically be guaranteed to get so much more impressive results out of it than you had seen with GBD-2.
So I opened up the first page, maybe the second page, and I saw a few-shot learning chart.
And I'm like, holy shit, we are living in the scaling world.
Leg and Moravec and Kurzweil were right.
Then I turned to Twitter and everyone else was like, oh, you know, this shows that scaling works so badly.
Why?
It's not even state of the art.