Dylan Patel
๐ค SpeakerAppearances Over Time
Podcast Appearances
When you do an LLM, it's the same thing, right?
If you throw some data at it, it will memorize it before it generalizes.
It's this concept called grokking.
You grok the subject, i.e.
it's like the aha moment.
Yeah.
The models do the same thing.
They memorize it.
Up until then, they understand it at some point.
And if you make the model bigger and bigger and bigger without the data changing, you just memorize everything.
And actually it starts to get worse again because it never had the opportunity to generalize because the model was so big and there's so many weights and there's so much capacity for information.
The challenge today is not necessarily make the model bigger.
The challenge is how do I generate and create data in useful domains so that the model gets better at them?
Nowhere on the internet does it show you how to fly through a spreadsheet using only your keyboard and all these like functions and all these things, right?
Like that's a repetition, that's bars, but there's no data on the internet about this.
So how do you teach a model that?
It's not going to learn it from reading the internet over and over and over again, which you and I could never do.
And so it has a level of intelligence that we can't do.
We can't read the whole internet, but it can't do basic stuff, which is like play with a spreadsheet.
How do you get it to learn these things?