Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Pricing

Andrej Karpathy

👤 Person
3419 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Andrej Karpathy — AGI is still a decade away

I will say that...

Dwarkesh Podcast
Andrej Karpathy — AGI is still a decade away

There have been some papers that I thought were interesting that actually look at the mechanisms behind in-context learning.

Dwarkesh Podcast
Andrej Karpathy — AGI is still a decade away

And I do think it's possible that in-context learning actually runs a small gradient descent loop internally in the layers of the neural network.

Dwarkesh Podcast
Andrej Karpathy — AGI is still a decade away

And so I recall one paper in particular where they were doing linear regression, actually, using in-context learning.

Dwarkesh Podcast
Andrej Karpathy — AGI is still a decade away

So basically, your inputs into the neural network are XY pairs.

Dwarkesh Podcast
Andrej Karpathy — AGI is still a decade away

x, y, x, y, x, y that happened to be on the line.

Dwarkesh Podcast
Andrej Karpathy — AGI is still a decade away

And then you do x and you expect the y. And the neural network, when you train it in this way, actually does do linear regression.

Dwarkesh Podcast
Andrej Karpathy — AGI is still a decade away

And normally when you would run linear regression, you have a small gradient descent optimizer that basically looks at x, y, looks at an error, calculates the gradient of the weights, and does the update a few times.

Dwarkesh Podcast
Andrej Karpathy — AGI is still a decade away

It just turns out that when they looked at the weights of that in-context learning algorithm,

Dwarkesh Podcast
Andrej Karpathy — AGI is still a decade away

they actually found some analogies to gradient descent mechanics.

Dwarkesh Podcast
Andrej Karpathy — AGI is still a decade away

In fact, I think even the paper was stronger because they actually hard-coded the weights of a neural network to do gradient descent through attention and all the internals of the neural network.

Dwarkesh Podcast
Andrej Karpathy — AGI is still a decade away

So I guess that's just my only pushback is that who knows how in-context learning works, but I actually think that it's probably doing a little bit of some kind of funky gradient descent internally, and that I think that that's possible.

Dwarkesh Podcast
Andrej Karpathy — AGI is still a decade away

So I guess I was only pushing back on you're saying it's not doing in-context learning.

Dwarkesh Podcast
Andrej Karpathy — AGI is still a decade away

Who knows what it's doing, but it's probably maybe doing something similar to it, but we don't know.

Dwarkesh Podcast
Andrej Karpathy — AGI is still a decade away

I think I kind of agree.

Dwarkesh Podcast
Andrej Karpathy — AGI is still a decade away

I mean, the way I usually put this is that anything that happens during the training of the neural network, the knowledge is only kind of like a hazy recollection of what happened in the training time.

Dwarkesh Podcast
Andrej Karpathy — AGI is still a decade away

And that's because the compression is dramatic.

Dwarkesh Podcast
Andrej Karpathy — AGI is still a decade away

You're taking 15 trillion tokens and you're compressing it to just your final network of a few billion parameters.

Dwarkesh Podcast
Andrej Karpathy — AGI is still a decade away

So obviously it's a massive amount of compression going on.

Dwarkesh Podcast
Andrej Karpathy — AGI is still a decade away

So I kind of refer to it as like a hazy recollection of the internet documents.