Andrej Karpathy
๐ค SpeakerAppearances Over Time
Podcast Appearances
Whereas anything that happens in the context window of the neural network, you're plugging all the tokens and it's building up all this KV cache representation, is very directly accessible to the neural net.
So I compare the KV cache and the stuff that happens at test time to more like a working memory.
Like all the stuff that's in the context window is very directly accessible to the neural net.
So there's always like these...
almost surprising analogies between LLMs and humans.
And I find them kind of surprising because we're not trying to build a human brain, of course, just directly.
We're just finding that this works and we're doing it.
But I do think that...
Anything that's in the weights, it's kind of like a hazy recollection of what you read a year ago.
Anything that you give it as a context at test time is directly in the working memory.
And I think that's a very powerful analogy to think through things.
So when you, for example, go to an LLM and you ask it about some book and what happened in it, like Nick Lane's book or something like that, the LLM will often give you some stuff, which is roughly correct.
But if you give it the full chapter and ask it questions, you're going to get much better results because it's now loaded in the working memory of the model.
So I basically agree with your very long way of saying that I kind of agree, and that's why.
I almost feel like just a lot of it still.
So maybe one way to think about it.
I don't know if this is the best way, but I almost kind of feel like, again, making these analogies, imperfect as they are.
We've stumbled by with the transformer neural network, which is extremely powerful, very general.
You can train transformers on audio or video or text or whatever you want, and it just learns patterns, and they're very powerful, and it works really well.
That, to me, almost indicates that this is kind of like some piece of cortical tissue.