Sergey Levine
๐ค SpeakerAppearances Over Time
Podcast Appearances
It's very hard to do because robotic experience consists of time steps that are very correlated with each other.
So like the raw like byte representation is enormous, but probably the information density is comparatively low.
Maybe a better comparison is to the data sets that are used for multimodal training.
And there it's, I believe last time we did that count, it was like between one and two orders of magnitude.
the vision you have of uh robotics will not be possible until you collect like what 100x 1000x more data well that's the thing that we don't know that um uh it's certainly very reasonable to infer that like you know robotics is a tough problem uh and probably it requires you know as much experience as the language stuff but because we don't know the answer to that to me a much more useful way to think about it is not
How much data do we need to get before we're fully done?
But how much data do we need to get before we can get started?
Meaning before we can get a data flywheel that represents a self-sustaining and ever-growing data collection.
Learning on the job or acquiring data in a way that the process of acquisition of that data itself is useful and valuable.
I see.
Like, just some kind of RL.
Like doing something, like, actually real.
Yeah.
I mean, ideally, I would like it to be RL because you can get away with the robot acting autonomously.
Which is easier.
But it's not out of the question that you can have mixed autonomy.
You can, you know, as I mentioned before, robots can learn from all sorts of other signals.
I described how we can have a robot that learns from a person talking to it.
So there's a lot of middle ground in between fully teleoperated robots and fully autonomous robots.
Yeah, so the current model that we have basically is a vision language model that has been adapted for motor control.