Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Andrej Karpathy

๐Ÿ‘ค Speaker
3419 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

I'm going to tell you at every single step of the way how well you're doing.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

And this is basically the reason we don't have that.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

It's tricky how you do that properly because you have partial solutions and you don't know how to assign credit.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

So when you get the right answer, it's just an equality match to the answer.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

Very simple to implement.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

If you're doing basically process supervision, how do you assign, in an automatable way, partial credit assignment?

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

It's not obvious how you do it.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

Lots of labs, I think, are trying to do it with these LLM judges.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

So basically, you get LLMs to try to do it.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

So you prompt an LLM, hey, look at a partial solution of a student.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

How well do you think they're doing if the answer is this?

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

And they try to tune the prompt.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

The reason that I think this is kind of tricky is quite subtle.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

And it's the fact that anytime you use an LLM to assign a reward, those LLMs are giant things with billions of parameters and they're gameable.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

And if you're reinforcement learning with respect to them, you will find adversarial examples for your LLM judges almost guaranteed.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

You can't do this for too long.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

You do maybe 10 steps or 20 steps, maybe it will work, but you can't do 100 or 1,000 because it's not obvious.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

Because I understand it's not obvious, but basically the model will find little cracks,

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

it will find all these spurious things in the nooks and crannies of the giant model and find a way to cheat it.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

So one example that's prominently in my mind is, I think this was probably public, but basically, if you're using an element judge for a reward, so you just give it a solution from a student and ask it if the student will or not,