Andrej Karpathy
๐ค SpeakerAppearances Over Time
Podcast Appearances
okay, well, take the, the, the, the, put in the training set of the LLM judge and say, this is not 100%, this is 0%.
You can do this.
But every time you do this, you get a new LLM and it still has adversarial examples.
There's infinity adversarial examples.
And I think probably if you iterate this a few times, it'll probably be harder and harder to find adversarial examples.
But I'm not 100% sure because this thing has a trillion parameters or whatnot.
So I bet you the labs are trying.
I don't actually, I still think, I still think we need other ideas.
So like this idea of like a review solution and come up with synthetic examples such that when you train on them, you get better and like meta-learn it in some way.
And I think there's some papers that I'm starting to see pop out.
I only am at a stage of like reading abstracts because a lot of these papers, you know, they're just ideas.
Someone has to actually like make it work on a frontier LLM lab scale.
in full generality.
Because when you see these papers, they pop up and it's just like a little bit of noisy, you know?
It's cool ideas, but I haven't actually seen anyone convincingly show that this is possible.
That said, the LLM labs are fairly closed, so who knows what they're doing now, but...
Yeah, I do think that we're missing some aspects there.
So as an example, when you're reading a book,
I almost feel like currently when LLMs are reading a book, what that means is we stretch out the sequence of text and the model is predicting the next token and it's getting some knowledge from that.
That's not really what humans do, right?