Andrej Karpathy
๐ค SpeakerAppearances Over Time
Podcast Appearances
So I almost feel like we are redoing a lot of the...
cognitive tricks that evolution came up with through a very different process.
But we're, I think, going to converge on a similar architecture cognitively.
Well, the way I like to think about it is, okay, let's translation invariance in time, right?
So 10 years ago, where were we?
2015, we had convolutional neural networks primarily.
Residual networks just came out.
So remarkably similar, I guess, but quite a bit different still.
I mean, Transformer was not around.
You know, all these sort of like more modern tweaks on the Transformer were not around.
So maybe some of the things that we can bet on, I think, in 10 years by translational sort of equivariance is we're still training giant neural networks with forward, backward, pass, and update through gradient descent.
But maybe it looks a little bit different.
And it's just everything is much bigger.
Actually, recently, I also went back all the way to 1989, which was kind of a fun exercise for me a few years ago, because I was reproducing Jan LeCun's 1989 convolutional network, which was the first neural network I'm aware of trained via gradient descent, like modern neural network trained gradient descent on digit recognition.
And I was just interested in, okay, how can I modernize this?
How much of this is algorithms?
How much of this is data?
How much of this progress is compute and systems?
And I was able to very quickly like half the learning rate, just knowing by time travel by 33 years.
So if I time travel by algorithms to 33 years, I could adjust what Yann LeCun did in 1989, and I could basically half the learning, half the error.