Andrej Karpathy
๐ค SpeakerAppearances Over Time
Podcast Appearances
I think it's a deterministic system.
I think that things that look random, like say the collapse of the wave function, et cetera, I think they're actually deterministic, just entanglement and so on, and some kind of a multiverse theory, something, something.
It feels like it.
Yeah, you're interpreting the choice and you're creating a narrative for having made it.
Well, the one that I've been thinking about recently the most probably is the
transformer architecture um so basically uh neural networks have uh a lot of architectures that were trendy have come and gone for different uh sensory modalities like for vision audio text you would process them with different looking neural nets and recently we've seen these this convergence towards one architecture the transformer and uh you can feed it video or you can feed it you know images or speech or text and it just gobbles it up and it's kind of like a
bit of a general-purpose computer that is also trainable and very efficient to run on our hardware.
And so this paper came out in 2016, I want to say.
Yeah, I'm not sure if the authors were aware of the impact that that paper would go on to have.
Probably they weren't.
But I think they were aware of some of the motivations and design decisions behind the Transformer, and they chose not to, I think, expand on it in that way in the paper.
And so I think they had an idea that there was more...
than just the surface of just like, oh, we're just doing translation and here's a better architecture.
You're not just doing translation.
This is like a really cool, differentiable, optimizable, efficient computer that you've proposed.
And maybe they didn't have all of that foresight, but I think it's really interesting.
Attention is all you need.
Yeah, it's like a meme or something, basically.
Honestly, yeah, there is an element of me that honestly agrees with you and prefers it this way.
Yes.