Andrej Karpathy
๐ค SpeakerAppearances Over Time
Podcast Appearances
So when you're reading a book, I almost don't even feel like the book is like exposition I'm supposed to be attending to and training on.
The book is a set of prompts for me to do synthetic data generation.
or for you to get into a book club and talk about it with your friends.
And it's by manipulating that information that you actually gain that knowledge.
And I think we have no equivalent of that, again, with LLMs.
They don't really do that, but I'd love to see during pre-training some kind of a stage that thinks through the material and tries to reconcile it with what it already knows and thinks through for some amount of time and gets that to work.
And so there's no equivalence of any of this.
This is all research.
There's some subtle, very subtle that I think are very hard to understand reasons why it's not trivial.
So if I can just describe one.
Why can't we just synthetically generate and train on it?
Well, because every synthetic example, like if I just give synthetic generation of the model thinking about a book, you look at it and you're like, this looks great.
Why can't I train on it?
Well, you could try, but the model will actually get much worse if you continue trying.
And that's because all of the samples you get from models are silently collapsed.
They're silently, this is not obvious if you look at any individual example of it, they occupy a very tiny manifold of the possible space of sort of thoughts about content.
So the LLMs, when they come off, they're what we call collapsed.
They have a collapsed data distribution.
If you sample, one easy way to say it is go to ChatGPT and ask it, tell me a joke.
It only has like three jokes.