Dwarkesh Patel

Dwarkesh Podcast

Andrej Karpathy — AGI is still a decade away

But our actual learning during our lifetime is like happening through some other process.

878.295 View full episode →

Dwarkesh Podcast

Andrej Karpathy — AGI is still a decade away

I actually don't fully agree with that, but you should continue with that.

882.34 View full episode →

Dwarkesh Podcast

Andrej Karpathy — AGI is still a decade away

Okay.

885.566 View full episode →

Dwarkesh Podcast

Andrej Karpathy — AGI is still a decade away

Actually, then I'm very curious to understand how that analogy breaks down.

885.686 View full episode →

Dwarkesh Podcast

Andrej Karpathy — AGI is still a decade away

So then it's worth thinking about, okay, if both of them are implementing gradient descent, sorry, if in-context learning and pre-training are both implementing something like gradient descent, why does it feel like in-context learning actually we're getting to this like continual learning, real intelligence-like thing, whereas you don't get the analogous feeling just from pre-training?

999.773 View full episode →

Dwarkesh Podcast

Andrej Karpathy — AGI is still a decade away

At least you could argue that.

1019.015 View full episode →

Dwarkesh Podcast

Andrej Karpathy — AGI is still a decade away

And so if it's the same algorithm, what could be different?

1020.957 View full episode →

Dwarkesh Podcast

Andrej Karpathy — AGI is still a decade away

Well, one way you can think about it is how much information

1022.899 View full episode →

Dwarkesh Podcast

Andrej Karpathy — AGI is still a decade away

does the model store per information it receives from training?

1026.503 View full episode →

Dwarkesh Podcast

Andrej Karpathy — AGI is still a decade away

And if you look at pre-training, if you look at Llama 3, for example, I think it's trained on 15 trillion tokens.

1031.711 View full episode →

Dwarkesh Podcast

Andrej Karpathy — AGI is still a decade away

And if you look at a 70B model, that would be the equivalent of 0.07 bits per token in that it sees in pre-training in terms of the information in the weights of the model compared to the tokens it reads.

1038.28 View full episode →

Dwarkesh Podcast

Andrej Karpathy — AGI is still a decade away

Whereas if you look at the KV cache and how it grows per additional token and in-context learning, it's like 320 kilobytes.

1050.398 View full episode →

Dwarkesh Podcast

Andrej Karpathy — AGI is still a decade away

So that's a 35 million fold difference in how much information per token is assimilated by the model.

1058.096 View full episode →

Dwarkesh Podcast

Andrej Karpathy — AGI is still a decade away

I wonder if that's relevant at all.

1065.052 View full episode →

Dwarkesh Podcast

Andrej Karpathy — AGI is still a decade away

Stepping back, what is the part about human intelligence that we have most failed to replicate with these models?

1151.381 View full episode →

Dwarkesh Podcast

Andrej Karpathy — AGI is still a decade away

This is maybe relevant to the question of thinking about how fast these issues will be solved.

1275.983 View full episode →

Dwarkesh Podcast

Andrej Karpathy — AGI is still a decade away

So sometimes people will say about continual learning, look, actually, you could easily replicate this capability just as in-context learning emerged spontaneously as a result of pre-training.

1281.575 View full episode →

Dwarkesh Podcast

Andrej Karpathy — AGI is still a decade away

Continual learning over longer horizons will emerge spontaneously if the model is incentivized to recollect information over longer horizons or horizons longer than one session.

1293.5 View full episode →

Dwarkesh Podcast

Andrej Karpathy — AGI is still a decade away

So if there's some like outer loop RL, which...

1305.379 View full episode →

Dwarkesh Podcast

Andrej Karpathy — AGI is still a decade away

it has many sessions within that outer loop, then like this continual learning where it uses like, it fine tunes itself or it writes to an external memory or something will just sort of like emerge spontaneously.

1310.628 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment