Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Andrej Karpathy

๐Ÿ‘ค Speaker
3419 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

I don't know that I fully resonate with that because I feel like these models, when you boot them up and they have zero tokens in the window, they're always like restarting from scratch where they were.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

So I don't actually know in that worldview what it looks like because, again, maybe making some analogies to humans just because I think it's roughly concrete and kind of interesting to think through.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

I feel like when I'm awake, I'm building up a context window of stuff that's happening during the day.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

But I feel like when I go to sleep, something magical happens where I don't actually think that that context window stays around.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

I think there's some process of distillation into weights of my brain.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

And this happens during sleep and all this kind of stuff.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

We don't have an equivalent of that in large language models.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

And that's, to me, more adjacent to when you talk about continual learning and so on as absent.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

These models don't really have this distillation phase of taking what happened, analyzing it, obsessively thinking through it, basically doing some kind of a synthetic data generation process and distilling it back into the weights, and maybe having a specific neural net per person.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

Maybe it's a LoRa, it's not a full...

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

Yeah, it's not a full-weight neural network.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

It's just some of the small sparse subset of the weights are changed.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

But basically, we do want to create ways of creating these individuals that have very long contacts.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

It's not only remaining in the contacts window because the contacts windows grow very, very long.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

Maybe we have some very elaborate sparse attention over it.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

But I still think that humans obviously have some process for distilling some of that knowledge into the weights.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

We're missing it.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

And I do also think that humans have some kind of a very elaborate sparse attention scheme, which I think we're starting to see some early hints of.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

So DeepSeek v3.2 just came out, and I saw that they have like a sparse attention as an example.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

And this is one way to have very, very long context windows.