Andrej Karpathy
๐ค SpeakerAppearances Over Time
Podcast Appearances
But it's the first time that a neural network has been applied in that setting.
But even before neural networks, there were language models, except they were using n-gram models.
So n-gram models are just count-based models.
So if you start to take two words and predict a third one, you just count up how many times you've seen any two-word combinations and what came next.
And what you predict as coming next is just what you've seen the most of in the training set.
And so language modeling has been around for a long time.
Neural networks have done language modeling for a long time.
So really what's new or interesting or exciting is just realizing that when you scale it up,
with a powerful enough neural net, a transformer, you have all these emergent properties where basically what happens is if you have a large enough data set of text,
You are in the task of predicting the next word.
You are multitasking a huge amount of different kinds of problems.
You are multitasking understanding of, you know, chemistry, physics, human nature.
Lots of things are sort of clustered in that objective.
It's a very simple objective, but actually you have to understand a lot about the world to make that prediction.
Yeah, so basically it gets a thousand words and it's trying to predict the thousandth and first.
And in order to do that very, very well over the entire data set available on the internet, you actually have to basically kind of understand the context of what's going on in there.
And it's a sufficiently hard problem that if you have a powerful enough computer, like a transformer, you end up with interesting solutions.
And you can ask it to do all kinds of things and
It shows a lot of emergent properties, like in-context learning.
That was the big deal with GPT and the original paper when they published it, is that you can just sort of prompt it in various ways and ask it to do various things, and it will just kind of complete the sentence.