Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Andrej Karpathy

๐Ÿ‘ค Speaker
3433 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

We don't have an equivalent of that in large language models.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

And that's, to me, more adjacent to when you talk about continual learning and so on as absent.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

These models don't really have this distillation phase of taking what happened, analyzing it, obsessively thinking through it, basically doing some kind of a synthetic data generation process and distilling it back into the weights, and maybe having a specific neural net per person.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

Maybe it's a LoRa, it's not a full...

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

Yeah, it's not a full-weight neural network.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

It's just some of the small sparse subset of the weights are changed.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

But basically, we do want to create ways of creating these individuals that have very long contacts.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

It's not only remaining in the contacts window because the contacts windows grow very, very long.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

Maybe we have some very elaborate sparse attention over it.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

But I still think that humans obviously have some process for distilling some of that knowledge into the weights.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

We're missing it.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

And I do also think that humans have some kind of a very elaborate sparse attention scheme, which I think we're starting to see some early hints of.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

So DeepSeek v3.2 just came out, and I saw that they have like a sparse attention as an example.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

And this is one way to have very, very long context windows.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

So I almost feel like we are redoing a lot of the...

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

cognitive tricks that evolution came up with through a very different process.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

But we're, I think, going to converge on a similar architecture cognitively.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

Well, the way I like to think about it is, okay, let's translation invariance in time, right?

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

So 10 years ago, where were we?

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

2015, we had convolutional neural networks primarily.