Andrej Karpathy
π€ SpeakerAppearances Over Time
Podcast Appearances
And then when you pre-train your massive neural net and get something that, you know, is like a GPT or something, then you're able to be very efficient at training any arbitrary new task.
So a lot of these GPTs, you can do tasks like sentiment analysis or translation or so on just by being prompted with very few examples.
Here's the kind of thing I want you to do.
Here's an input sentence, here's the translation into German.
Input sentence, translation to German.
Input sentence, blank, and the neural net will complete the translation to German just by looking at the example you've provided.
And so that's an example of a very few-shot learning in the activations of the neural net instead of the weights of the neural net.
And so I think...
Basically, just like humans, neural nets will become very data efficient at learning any other new task.
But at some point, you need a massive data set to pre-train your network.
I think humans definitely, I mean, obviously we have, we learn a lot during our lifespan, but also we have a ton of hardware that helps us at initialization coming from sort of evolution.
And so I think that's also a really big component.
A lot of people in the field, I think they just talk about the amounts of like seconds and the, you know, that a person has lived pretending that this is a tabula rasa, sort of like a zero initialization of a neural net.
And it's not.
You can look at a lot of animals, like, for example, zebras.
Zebras get born, and they see, and they can run.
There's zero training data in their lifespan.
They can just do that.
So somehow, I have no idea how, evolution has found a way to encode these algorithms and these neural net initializations that are extremely good into ATCGs.
And I have no idea how this works, but apparently it's possible, because here's a proof by existence.