Andrej Karpathy

Andrej Karpathy — AGI is still a decade away

We were training with reinforcement learning against that reward function, and it worked really well, and then suddenly the reward became extremely large.

2818.78 View full episode →

Dwarkesh Podcast

Andrej Karpathy — AGI is still a decade away

It was a massive jump, and it did perfect.

2827.107 View full episode →

Dwarkesh Podcast

Andrej Karpathy — AGI is still a decade away

And you're looking at it like, wow, this means the student is perfect in all these problems.

2829.029 View full episode →

Dwarkesh Podcast

Andrej Karpathy — AGI is still a decade away

It's fully solved math.

2833.253 View full episode →

Dwarkesh Podcast

Andrej Karpathy — AGI is still a decade away

But actually what's happening is that when you look at the completions that you're getting from the model, they are complete nonsense.

2835.214 View full episode →

Dwarkesh Podcast

Andrej Karpathy — AGI is still a decade away

They start out okay, and then they change to da-da-da-da-da-da-da.

2839.999 View full episode →

Dwarkesh Podcast

Andrej Karpathy — AGI is still a decade away

So it's just like, oh, okay, let's take two plus three, and we do this and this, and then da-da-da-da-da-da-da-da.

2842.861 View full episode →

Dwarkesh Podcast

Andrej Karpathy — AGI is still a decade away

And you're looking at it and it's like, this is crazy.

2847.265 View full episode →

Dwarkesh Podcast

Andrej Karpathy — AGI is still a decade away

How is it getting a reward of one or 100%?

2848.79 View full episode →

Dwarkesh Podcast

Andrej Karpathy — AGI is still a decade away

And you look at the LLM judge and it turns out that the, the, the, the, the is an adversarial examples for the model and it assigns 100% probability to it.

2851.058 View full episode →

Dwarkesh Podcast

Andrej Karpathy — AGI is still a decade away

And it's just because this is an out-of-sample example to the LLM.

2858.639 View full episode →

Dwarkesh Podcast

Andrej Karpathy — AGI is still a decade away

It's never seen it during training, and you're in pure generalization land.

2861.824 View full episode →

Dwarkesh Podcast

Andrej Karpathy — AGI is still a decade away

It's never seen it during training, and in the pure generalization land, you can find these examples that break it.