Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Andrej Karpathy

๐Ÿ‘ค Speaker
3419 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

Whereas anything that happens in the context window of the neural network, you're plugging all the tokens and it's building up all this KV cache representation, is very directly accessible to the neural net.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

So I compare the KV cache and the stuff that happens at test time to more like a working memory.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

Like all the stuff that's in the context window is very directly accessible to the neural net.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

So there's always like these...

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

almost surprising analogies between LLMs and humans.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

And I find them kind of surprising because we're not trying to build a human brain, of course, just directly.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

We're just finding that this works and we're doing it.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

But I do think that...

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

Anything that's in the weights, it's kind of like a hazy recollection of what you read a year ago.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

Anything that you give it as a context at test time is directly in the working memory.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

And I think that's a very powerful analogy to think through things.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

So when you, for example, go to an LLM and you ask it about some book and what happened in it, like Nick Lane's book or something like that, the LLM will often give you some stuff, which is roughly correct.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

But if you give it the full chapter and ask it questions, you're going to get much better results because it's now loaded in the working memory of the model.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

So I basically agree with your very long way of saying that I kind of agree, and that's why.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

I almost feel like just a lot of it still.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

So maybe one way to think about it.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

I don't know if this is the best way, but I almost kind of feel like, again, making these analogies, imperfect as they are.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

We've stumbled by with the transformer neural network, which is extremely powerful, very general.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

You can train transformers on audio or video or text or whatever you want, and it just learns patterns, and they're very powerful, and it works really well.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

That, to me, almost indicates that this is kind of like some piece of cortical tissue.