Andrej Karpathy
π€ SpeakerAppearances Over Time
Podcast Appearances
So people are scaling up the data sets, making them much, much bigger.
They're working on the evaluation, making the evaluation much, much bigger.
And they're basically keeping the architecture unchanged.
And that's the last five years of progress in AI, kind of.
Basically, the way GPT is trained is you just download a massive amount of text data from the internet, and you try to predict the next word in the sequence, roughly speaking.
You're predicting little word chunks, but roughly speaking, that's it.
And what's been really interesting to watch is, basically, it's a language model.
Language models have actually existed for a very long time.
There's papers on language modeling from 2003, even earlier.
Yeah, so language model, just basically the rough idea is just predicting the next word in a sequence, roughly speaking.
So there's a paper from, for example, Benjio and the team from 2003, where for the first time they were using a neural network to take, say, like three or five words and predict the next word.
And they're doing this on much smaller data sets.
And the neural net is not a transformer.
It's a multi-layer perceptron.
But it's the first time that a neural network has been applied in that setting.
But even before neural networks, there were language models, except they were using n-gram models.
So n-gram models are just count-based models.
So if you start to take two words and predict a third one, you just count up how many times you've seen any two-word combinations and what came next.
And what you predict as coming next is just what you've seen the most of in the training set.
And so language modeling has been around for a long time.