Sergey Levine
๐ค SpeakerAppearances Over Time
Podcast Appearances
And then at some point, we actually realized that we can actually make significant headway once the model was good enough by supervising it, not just with low-level actions, but actually literally instructing it through language.
Now, you need a certain level of competence before you can do that, but once you have that level of competence, just standing there and telling the robot, okay, now pick up the cup, put the cup in the sink, put the dish in the sink, just with words, already actually gives the robot information that it can use to get better.
Now, imagine what this implies for the human plus robot dynamic.
Like now, basically learning is not โ for these systems, it's not just learning from real action.
It's also learning from words, eventually be learning from observing what people do, from the kind of natural feedback that you receive when you're doing a job together with somebody else.
And โ
This is also the kind of stuff where the prior knowledge that comes from these big models is tremendously valuable because that lets you understand that interaction dynamic.
So I think that there's a lot of potential for these kind of human plus robot deployments to make the model better.
Yeah, that's a really good question.
So one of the big things that is different now than it was in 2009 actually has to do with
the technology for machine learning systems that understand the world around them.
Principally, for autonomous driving, this is perception.
For robots, it can mean a few other things as well.
And perception certainly was not in a good place in 2009.
The trouble with perception is that it's one of those things where you can nail a really good demo with a somewhat engineered system, but
hit a brick wall when you try to generalize it.
Now at this point in 2025, we have much better technology for generalizable and robust perception systems and more generally generalizable and robust systems for understanding the world around us.
Like when you say that the system is scalable and machine learning scalable really means generalizable.
So that gives us a much better starting point today.
So that's not an argument about robotics being easier than autonomous driving.