Andrej Karpathy
๐ค SpeakerAppearances Over Time
Podcast Appearances
So I bet you the labs are trying.
I don't actually, I still think, I still think we need other ideas.
So like this idea of like a review solution and come up with synthetic examples such that when you train on them, you get better and like meta-learn it in some way.
And I think there's some papers that I'm starting to see pop out.
I only am at a stage of like reading abstracts because a lot of these papers, you know, they're just ideas.
Someone has to actually like make it work on a frontier LLM lab scale.
in full generality.
Because when you see these papers, they pop up and it's just like a little bit of noisy, you know?
It's cool ideas, but I haven't actually seen anyone convincingly show that this is possible.
That said, the LLM labs are fairly closed, so who knows what they're doing now, but...
Yeah, I do think that we're missing some aspects there.
So as an example, when you're reading a book,
I almost feel like currently when LLMs are reading a book, what that means is we stretch out the sequence of text and the model is predicting the next token and it's getting some knowledge from that.
That's not really what humans do, right?
So when you're reading a book, I almost don't even feel like the book is like exposition I'm supposed to be attending to and training on.
The book is a set of prompts for me to do synthetic data generation.
or for you to get into a book club and talk about it with your friends.
And it's by manipulating that information that you actually gain that knowledge.
And I think we have no equivalent of that, again, with LLMs.
They don't really do that, but I'd love to see during pre-training some kind of a stage that thinks through the material and tries to reconcile it with what it already knows and thinks through for some amount of time and gets that to work.