Andrej Karpathy
π€ SpeakerAppearances Over Time
Podcast Appearances
Yeah, so text by itself, I'm a little bit suspicious about.
There's a ton of things we don't put in text in writing, just because they're obvious to us about how the world works and the physics of it and that things fall.
We don't put that stuff in text because why would you?
We share that understanding.
And so text is a communication medium between humans, and it's not an all-encompassing medium of knowledge about the world.
But as you pointed out, we do have video, and we have images, and we have audio.
And so I think that definitely helps a lot.
But we haven't trained models sufficiently across all those modalities yet.
So I think that's what a lot of people are interested in.
Yes, I think that's probably the final frontier for a lot of these models because, so as you mentioned, when I was at OpenAI, I was working on this project, World of Bits, and basically it was the idea of giving neural networks access to a keyboard and a mouse.
What could possibly go wrong?
So basically you perceive the input of the screen pixels and basically the state of the computer is sort of visualized for human consumption in images of the web browser and stuff like that.
And then you give the neural network the ability to press keyboards and use the mouse.
And we're trying to get it to, for example, complete bookings and interact with user interfaces.
Yeah.
Well, it's the universal interface in the digital realm, I would say.
And there's a universal interface in the physical realm, which in my mind is a humanoid form factor kind of thing.
We can later talk about Optimus and so on.
But I feel like there's a...
They're kind of like a similar philosophy in some way, where the physical world is designed for the human form, and the digital world is designed for the human form of seeing the screen and using keyboard and mouse.