Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Andrej Karpathy

๐Ÿ‘ค Speaker
3433 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

Residual networks just came out.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

So remarkably similar, I guess, but quite a bit different still.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

I mean, Transformer was not around.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

You know, all these sort of like more modern tweaks on the Transformer were not around.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

So maybe some of the things that we can bet on, I think, in 10 years by translational sort of equivariance is we're still training giant neural networks with forward, backward, pass, and update through gradient descent.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

But maybe it looks a little bit different.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

And it's just everything is much bigger.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

Actually, recently, I also went back all the way to 1989, which was kind of a fun exercise for me a few years ago, because I was reproducing Jan LeCun's 1989 convolutional network, which was the first neural network I'm aware of trained via gradient descent, like modern neural network trained gradient descent on digit recognition.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

And I was just interested in, okay, how can I modernize this?

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

How much of this is algorithms?

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

How much of this is data?

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

How much of this progress is compute and systems?

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

And I was able to very quickly like half the learning rate, just knowing by time travel by 33 years.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

So if I time travel by algorithms to 33 years, I could adjust what Yann LeCun did in 1989, and I could basically half the learning, half the error.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

But to get further gains, I had to add a lot more data.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

I had to 10x the training set.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

And then I had to actually add more computational optimizations.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

I had to basically train for much longer with dropout and other regularization techniques.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

And so it's almost like all these things have to improve simultaneously.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

So we're probably going to have a lot more data.