Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Andrej Karpathy

๐Ÿ‘ค Speaker
3433 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

So it's just like, oh, okay, let's take two plus three, and we do this and this, and then da-da-da-da-da-da-da-da.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

And you're looking at it and it's like, this is crazy.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

How is it getting a reward of one or 100%?

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

And you look at the LLM judge and it turns out that the, the, the, the, the is an adversarial examples for the model and it assigns 100% probability to it.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

And it's just because this is an out-of-sample example to the LLM.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

It's never seen it during training, and you're in pure generalization land.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

It's never seen it during training, and in the pure generalization land, you can find these examples that break it.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

Not even that.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

Prompt injection is way too fancy.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

You're finding adversarial examples, as they're called.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

These are nonsensical solutions that are obviously wrong, but the model thinks they're amazing.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

Yeah.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

I think the labs are probably doing all that.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

Like, okay, so the obvious thing is like the should not get 100% reward.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

okay, well, take the, the, the, the, put in the training set of the LLM judge and say, this is not 100%, this is 0%.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

You can do this.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

But every time you do this, you get a new LLM and it still has adversarial examples.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

There's infinity adversarial examples.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

And I think probably if you iterate this a few times, it'll probably be harder and harder to find adversarial examples.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

But I'm not 100% sure because this thing has a trillion parameters or whatnot.