Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Andrej Karpathy

๐Ÿ‘ค Speaker
3433 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

And they think through things.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

There's nothing in current LLMs that does this.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

There's no equivalent of it.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

But I do see papers popping out that are trying to do this because it's obvious to everyone in the field.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

So I kind of see it as like, the first imitation learning actually, by the way, was extremely surprising and miraculous and amazing that we can fine-tune by imitation in humans.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

And that was incredible.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

Because in the beginning, all we had was base models.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

Base models are autocomplete.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

And it wasn't obvious to me at the time, and I had to learn this, and the paper that blew my mind was InstructGPT.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

Because it pointed out that, hey, you can take the pre-trained model, which is autocomplete,

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

And if you just fine-tune it on text that looks like conversations, the model will very rapidly adapt to become very conversational.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

And it keeps all the knowledge from pre-training.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

And this blew my mind because I didn't understand that this just like stylistically can adjust so quickly and become an assistant to a user through just a few loops of fine-tuning on that kind of data.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

It was very miraculous to me that that worked.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

So incredible.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

And that was like two years, three years of work.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

And now came RL.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

And RL allows you to do a bit better than just imitation learning, right?

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

Because you can't have these reward functions and you can hill climb on the reward functions.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

And so some problems have just correct answers.