Andrej Karpathy

They don't really do that, but I'd love to see during pre-training some kind of a stage that thinks through the material and tries to reconcile it with what it already knows and thinks through for some amount of time and gets that to work.

3051.672 View full episode →

Dwarkesh Podcast

Andrej Karpathy — AGI is still a decade away

And so there's no equivalence of any of this.

3063.232 View full episode →

Dwarkesh Podcast

Andrej Karpathy — AGI is still a decade away

This is all research.

3064.898 View full episode →

Dwarkesh Podcast

Andrej Karpathy — AGI is still a decade away

There's some subtle, very subtle that I think are very hard to understand reasons why it's not trivial.

3065.882 View full episode →

Dwarkesh Podcast

Andrej Karpathy — AGI is still a decade away

So if I can just describe one.

3070.881 View full episode →

Dwarkesh Podcast

Andrej Karpathy — AGI is still a decade away

Why can't we just synthetically generate and train on it?

3073.171 View full episode →

Dwarkesh Podcast

Andrej Karpathy — AGI is still a decade away

Well, because every synthetic example, like if I just give synthetic generation of the model thinking about a book, you look at it and you're like, this looks great.

3075.535 View full episode →

Dwarkesh Podcast

Andrej Karpathy — AGI is still a decade away

Why can't I train on it?

3082.405 View full episode →

Dwarkesh Podcast

Andrej Karpathy — AGI is still a decade away

Well, you could try, but the model will actually get much worse if you continue trying.

3083.447 View full episode →

Dwarkesh Podcast

Andrej Karpathy — AGI is still a decade away

And that's because all of the samples you get from models are silently collapsed.

3086.612 View full episode →

Dwarkesh Podcast

Andrej Karpathy — AGI is still a decade away

They're silently, this is not obvious if you look at any individual example of it, they occupy a very tiny manifold of the possible space of sort of thoughts about content.

3091.139 View full episode →

Dwarkesh Podcast

Andrej Karpathy — AGI is still a decade away

So the LLMs, when they come off, they're what we call collapsed.

3099.952 View full episode →

Dwarkesh Podcast

Andrej Karpathy — AGI is still a decade away

They have a collapsed data distribution.

3103.358 View full episode →

Dwarkesh Podcast

Andrej Karpathy — AGI is still a decade away

If you sample, one easy way to say it is go to ChatGPT and ask it, tell me a joke.