Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Andrej Karpathy

๐Ÿ‘ค Speaker
3419 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

You're given a math problem, and you're trying to find a solution.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

Now, in reinforcement learning, you will try lots of things in parallel first.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

So you're given a problem, you try hundreds of things,

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

different attempts.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

And these attempts can be complex, right?

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

They can be like, oh, let me try this, let me try that, this didn't work, that didn't work, et cetera.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

And then maybe you get an answer.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

And now you check the back of the book and you see, okay, the correct answer is this.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

And then you can see that, okay, this one, this one, and that one got the correct answer, but these other 97 of them didn't.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

So literally what reinforcement learning does is it goes to the ones that worked really well, and every single thing you did along the way, every single token gets up-weighted of, like, do more of this.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

The problem with that is, I mean, people will say that your estimator has high variance, but, I mean, it's just noisy.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

It's noisy.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

So basically, it kind of almost assumes that every single little piece of the solution that you made that right at the right answer was the correct thing to do, which is not true.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

Like, you may have gone down the wrong alleys

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

until you write the right solution.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

Every single one of those incorrect things you did, as long as you got to the correct solution, will be up-weighted as do more of this.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

It's terrible.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

It's noise.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

You've done all this work only to find a single, at the end, you get a single number of like, oh, you did correct.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

And based on that, you weigh that entire trajectory as like up-weight or down-weight.