Andrej Karpathy
๐ค SpeakerAppearances Over Time
Podcast Appearances
And simulation is, of course, there's a domain gap in a simulation.
That's not the real world.
It's slightly something different.
But with a powerful enough neural net, the domain gap can be bigger, I think, because the neural net will sort of understand that even though it's not the real world, it has all this high-level structure that I'm supposed to be able to learn from.
A hundred percent.
I just think like at some point you need a massive data set.
And then when you pre-train your massive neural net and get something that, you know, is like a GPT or something, then you're able to be very efficient at training any arbitrary new task.
So a lot of these GPTs, you can do tasks like sentiment analysis or translation or so on just by being prompted with very few examples.
Here's the kind of thing I want you to do.
Here's an input sentence, here's the translation into German.
Input sentence, translation to German.
Input sentence, blank, and the neural net will complete the translation to German just by looking at the example you've provided.
And so that's an example of a very few-shot learning in the activations of the neural net instead of the weights of the neural net.
And so I think...
Basically, just like humans, neural nets will become very data efficient at learning any other new task.
But at some point, you need a massive data set to pre-train your network.
I think humans definitely, I mean, obviously we have, we learn a lot during our lifespan, but also we have a ton of hardware that helps us at initialization coming from sort of evolution.
And so I think that's also a really big component.
A lot of people in the field, I think they just talk about the amounts of like seconds and the, you know, that a person has lived pretending that this is a tabula rasa, sort of like a zero initialization of a neural net.
And it's not.