Andrej Karpathy
๐ค SpeakerAppearances Over Time
Podcast Appearances
Currently, it definitely looks like the transformer is taking over AI, and you can feed basically arbitrary problems into it.
And it's a general, differentiable computer, and it's extremely powerful.
And this convergence in AI has been really interesting to watch for me personally.
Definitely the zeitgeist today is just pushing.
Basically, right now, the zeitgeist is do not touch the transformer.
Touch everything else.
So people are scaling up the data sets, making them much, much bigger.
They're working on the evaluation, making the evaluation much, much bigger.
And they're basically keeping the architecture unchanged.
And that's the last five years of progress in AI, kind of.
Basically, the way GPT is trained is you just download a massive amount of text data from the internet, and you try to predict the next word in the sequence, roughly speaking.
You're predicting little word chunks, but roughly speaking, that's it.
And what's been really interesting to watch is, basically, it's a language model.
Language models have actually existed for a very long time.
There's papers on language modeling from 2003, even earlier.
Yeah, so language model, just basically the rough idea is just predicting the next word in a sequence, roughly speaking.
So there's a paper from, for example, Benjio and the team from 2003, where for the first time they were using a neural network to take, say, like three or five words and predict the next word.
And they're doing this on much smaller data sets.
And the neural net is not a transformer.
It's a multi-layer perceptron.