Sergey Levine
๐ค SpeakerAppearances Over Time
Podcast Appearances
There's the sky.
There's the clouds are moving around, the water, cars driving around, people.
If you want to predict everything that will happen in the future, you can do so in many different ways.
You can say, okay, there's people around, so let me get really good at understanding, like, the psychology of how people behave in crowds and predict the pedestrians.
But you could also say, like, well, there's clouds moving around.
Let me, like, understand everything about water molecules and ice particles in the air.
And you can go super deep on that.
Right.
If you want to fully understand down to the subatomic level everything that's going on, as a person, you could spend decades just thinking about that, and you'll never even get to the pedestrians or the water, right?
So if you want to really predict everything that's going on in that scene, there's just so much stuff.
that even if you're doing a really great job and capturing like 100% of something, by the time you get to everything else, like, you know, ages will have passed.
Whereas with text, it's already been abstract into those bits that we as humans care about.
So the representations are already there, and they're not just good representations.
They actually focus in on what really matters.
Okay, so that's the bad news.
Here's the good news.
The good news is that...
we don't have to just get everything out of like pointing a camera outside this building.
Because when you have a robot, that robot is actually trying to do a job.
So it has...