Andrej Karpathy
π€ SpeakerAppearances Over Time
Podcast Appearances
Neural networks have done language modeling for a long time.
So really what's new or interesting or exciting is just realizing that when you scale it up,
with a powerful enough neural net, a transformer, you have all these emergent properties where basically what happens is if you have a large enough data set of text,
You are in the task of predicting the next word.
You are multitasking a huge amount of different kinds of problems.
You are multitasking understanding of, you know, chemistry, physics, human nature.
Lots of things are sort of clustered in that objective.
It's a very simple objective, but actually you have to understand a lot about the world to make that prediction.
Yeah, so basically it gets a thousand words and it's trying to predict the thousandth and first.
And in order to do that very, very well over the entire data set available on the internet, you actually have to basically kind of understand the context of what's going on in there.
And it's a sufficiently hard problem that if you have a powerful enough computer, like a transformer, you end up with interesting solutions.
And you can ask it to do all kinds of things and
It shows a lot of emergent properties, like in-context learning.
That was the big deal with GPT and the original paper when they published it, is that you can just sort of prompt it in various ways and ask it to do various things, and it will just kind of complete the sentence.
But in the process of just completing the sentence, it's actually solving all kinds of really interesting problems that we care about.
I think it's doing some understanding.
In its weights, it understands, I think, a lot about the world, and it has to in order to predict the next word in a sequence.
Yeah, so I think the internet has a huge amount of data.
I'm not sure if it's a complete enough set.
I don't know that text is enough for having a sufficiently powerful AGI as an outcome.