Andrej Karpathy
๐ค SpeakerAppearances Over Time
Podcast Appearances
But in the process of just completing the sentence, it's actually solving all kinds of really interesting problems that we care about.
I think it's doing some understanding.
In its weights, it understands, I think, a lot about the world, and it has to in order to predict the next word in a sequence.
Yeah, so I think the internet has a huge amount of data.
I'm not sure if it's a complete enough set.
I don't know that text is enough for having a sufficiently powerful AGI as an outcome.
Yeah, so text by itself, I'm a little bit suspicious about.
There's a ton of things we don't put in text in writing, just because they're obvious to us about how the world works and the physics of it and that things fall.
We don't put that stuff in text because why would you?
We share that understanding.
And so text is a communication medium between humans, and it's not an all-encompassing medium of knowledge about the world.
But as you pointed out, we do have video, and we have images, and we have audio.
And so I think that definitely helps a lot.
But we haven't trained models sufficiently across all those modalities yet.
So I think that's what a lot of people are interested in.
Yes, I think that's probably the final frontier for a lot of these models because, so as you mentioned, when I was at OpenAI, I was working on this project, World of Bits, and basically it was the idea of giving neural networks access to a keyboard and a mouse.
What could possibly go wrong?
So basically you perceive the input of the screen pixels and basically the state of the computer is sort of visualized for human consumption in images of the web browser and stuff like that.
And then you give the neural network the ability to press keyboards and use the mouse.
And we're trying to get it to, for example, complete bookings and interact with user interfaces.