Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Andrej Karpathy

๐Ÿ‘ค Speaker
3419 total appearances

Appearances Over Time

Podcast Appearances

Lex Fridman Podcast
#333 โ€“ Andrej Karpathy: Tesla AI, Self-Driving, Optimus, Aliens, and AGI

And so the way to look at it, I think, is because of the residual pathway in the backward pass, the gradients sort of flow along it uninterrupted because addition distributes the gradient equally to all of its branches.

Lex Fridman Podcast
#333 โ€“ Andrej Karpathy: Tesla AI, Self-Driving, Optimus, Aliens, and AGI

So the gradient from the supervision at the top just floats directly to the first layer.

Lex Fridman Podcast
#333 โ€“ Andrej Karpathy: Tesla AI, Self-Driving, Optimus, Aliens, and AGI

And all the residual connections are arranged so that in the beginning during initialization, they contribute nothing to the residual pathway.

Lex Fridman Podcast
#333 โ€“ Andrej Karpathy: Tesla AI, Self-Driving, Optimus, Aliens, and AGI

Mm-hmm.

Lex Fridman Podcast
#333 โ€“ Andrej Karpathy: Tesla AI, Self-Driving, Optimus, Aliens, and AGI

So what it kind of looks like is, imagine the transformer is kind of like a Python function, like a dev.

Lex Fridman Podcast
#333 โ€“ Andrej Karpathy: Tesla AI, Self-Driving, Optimus, Aliens, and AGI

And you get to do various kinds of lines of code.

Lex Fridman Podcast
#333 โ€“ Andrej Karpathy: Tesla AI, Self-Driving, Optimus, Aliens, and AGI

Say you have a 100 layers deep transformer, typically they would be much shorter, say 20.

Lex Fridman Podcast
#333 โ€“ Andrej Karpathy: Tesla AI, Self-Driving, Optimus, Aliens, and AGI

So you have 20 lines of code and you can do something in them.

Lex Fridman Podcast
#333 โ€“ Andrej Karpathy: Tesla AI, Self-Driving, Optimus, Aliens, and AGI

And so during the optimization, basically what it looks like is first you optimize the first line of code, and then the second line of code can kick in, and the third line of code can kick in.

Lex Fridman Podcast
#333 โ€“ Andrej Karpathy: Tesla AI, Self-Driving, Optimus, Aliens, and AGI

And I feel like because of the residual pathway and the dynamics of the optimization, you can learn a very short algorithm that gets the approximate answer, but then the other layers can kick in and start to create a contribution.

Lex Fridman Podcast
#333 โ€“ Andrej Karpathy: Tesla AI, Self-Driving, Optimus, Aliens, and AGI

And at the end of it, you're optimizing over an algorithm that is 20 lines of code.

Lex Fridman Podcast
#333 โ€“ Andrej Karpathy: Tesla AI, Self-Driving, Optimus, Aliens, and AGI

except these lines of code are very complex because it's an entire block of a transformer.

Lex Fridman Podcast
#333 โ€“ Andrej Karpathy: Tesla AI, Self-Driving, Optimus, Aliens, and AGI

You can do a lot in there.

Lex Fridman Podcast
#333 โ€“ Andrej Karpathy: Tesla AI, Self-Driving, Optimus, Aliens, and AGI

What's really interesting is that this transformer architecture actually has been remarkably resilient.

Lex Fridman Podcast
#333 โ€“ Andrej Karpathy: Tesla AI, Self-Driving, Optimus, Aliens, and AGI

Basically, the transformer that came out in 2016 is the transformer you would use today, except you reshuffle some of the layer norms.

Lex Fridman Podcast
#333 โ€“ Andrej Karpathy: Tesla AI, Self-Driving, Optimus, Aliens, and AGI

The related normalizations have been reshuffled to a pre-norm formulation.

Lex Fridman Podcast
#333 โ€“ Andrej Karpathy: Tesla AI, Self-Driving, Optimus, Aliens, and AGI

And so it's been remarkably stable, but there's a lot of bells and whistles that people have attached to it and tried to improve it.

Lex Fridman Podcast
#333 โ€“ Andrej Karpathy: Tesla AI, Self-Driving, Optimus, Aliens, and AGI

I do think that basically it's a big step in simultaneously optimizing for lots of properties of a desirable neural network architecture.

Lex Fridman Podcast
#333 โ€“ Andrej Karpathy: Tesla AI, Self-Driving, Optimus, Aliens, and AGI

And I think people have been trying to change it, but it's proven remarkably resilient.

Lex Fridman Podcast
#333 โ€“ Andrej Karpathy: Tesla AI, Self-Driving, Optimus, Aliens, and AGI

But I do think that there should be even better architectures potentially.