Andrej Karpathy
๐ค SpeakerAppearances Over Time
Podcast Appearances
But to get further gains, I had to add a lot more data.
I had to 10x the training set.
And then I had to actually add more computational optimizations.
I had to basically train for much longer with dropout and other regularization techniques.
And so it's almost like all these things have to improve simultaneously.
So we're probably going to have a lot more data.
We're probably going to have a lot better hardware.
We're probably going to have a lot better kernels and software.
We're probably going to have better algorithms.
And all of those, it's almost like no one of them is winning too much.
All of them are surprisingly equal.
And this has kind of been the trend for a while.
So I guess to answer maybe your question, I expect differences algorithmically to what's happening today.
But I do also expect that some of the things that have stuck around for a very long time will probably still be there.
It's probably still a giant neural network trained with gradient descent.
That would be my guess.
But I guess what was shocking to me is everything needs to improve across the board.
Architecture, optimizer, loss function, and also has improved across the board forever.
So I kind of expect all those changes to be alive and well.
Building NanoChat?