Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Andrej Karpathy

๐Ÿ‘ค Speaker
3419 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

So I almost feel like we are redoing a lot of the...

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

cognitive tricks that evolution came up with through a very different process.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

But we're, I think, going to converge on a similar architecture cognitively.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

Well, the way I like to think about it is, okay, let's translation invariance in time, right?

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

So 10 years ago, where were we?

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

2015, we had convolutional neural networks primarily.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

Residual networks just came out.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

So remarkably similar, I guess, but quite a bit different still.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

I mean, Transformer was not around.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

You know, all these sort of like more modern tweaks on the Transformer were not around.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

So maybe some of the things that we can bet on, I think, in 10 years by translational sort of equivariance is we're still training giant neural networks with forward, backward, pass, and update through gradient descent.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

But maybe it looks a little bit different.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

And it's just everything is much bigger.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

Actually, recently, I also went back all the way to 1989, which was kind of a fun exercise for me a few years ago, because I was reproducing Jan LeCun's 1989 convolutional network, which was the first neural network I'm aware of trained via gradient descent, like modern neural network trained gradient descent on digit recognition.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

And I was just interested in, okay, how can I modernize this?

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

How much of this is algorithms?

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

How much of this is data?

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

How much of this progress is compute and systems?

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

And I was able to very quickly like half the learning rate, just knowing by time travel by 33 years.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

So if I time travel by algorithms to 33 years, I could adjust what Yann LeCun did in 1989, and I could basically half the learning, half the error.