Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Sergey Levine

๐Ÿ‘ค Speaker
700 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Fully autonomous robots are much closer than you think โ€“ Sergey Levine

So to give you a little bit of like a fanciful brain analogy, a VLM, a vision language model, is basically an LLM that has had a little like pseudo visual cortex grafted to it, a vision encoder, right?

Dwarkesh Podcast
Fully autonomous robots are much closer than you think โ€“ Sergey Levine

So our models, they have a vision encoder, but they also have an action expert, an action decoder essentially.

Dwarkesh Podcast
Fully autonomous robots are much closer than you think โ€“ Sergey Levine

So it has like a little visual cortex and notionally a little motor cortex.

Dwarkesh Podcast
Fully autonomous robots are much closer than you think โ€“ Sergey Levine

And the way that the model actually makes decisions is it reads in the sensory information from the robot.

Dwarkesh Podcast
Fully autonomous robots are much closer than you think โ€“ Sergey Levine

It does some internal processing and that could involve actually outputting intermediate steps.

Dwarkesh Podcast
Fully autonomous robots are much closer than you think โ€“ Sergey Levine

Like you might tell it clean up the kitchen and it might think to itself like, hey, to clean up the kitchen, I need to pick up the dish and I need to pick up the sponge and I need to put this and this.

Dwarkesh Podcast
Fully autonomous robots are much closer than you think โ€“ Sergey Levine

And then eventually it works its way through that chain of thought generation down to the action expert, which actually produces continuous actions.

Dwarkesh Podcast
Fully autonomous robots are much closer than you think โ€“ Sergey Levine

And that has to be a different module because the actions are continuous.

Dwarkesh Podcast
Fully autonomous robots are much closer than you think โ€“ Sergey Levine

They're high frequency, so they have a different data format than text tokens.

Dwarkesh Podcast
Fully autonomous robots are much closer than you think โ€“ Sergey Levine

But structurally, it's still an end-to-end transformer.

Dwarkesh Podcast
Fully autonomous robots are much closer than you think โ€“ Sergey Levine

And roughly speaking, technically, it corresponds to a kind of mixture of experts architecture.

Dwarkesh Podcast
Fully autonomous robots are much closer than you think โ€“ Sergey Levine

That's right.

Dwarkesh Podcast
Fully autonomous robots are much closer than you think โ€“ Sergey Levine

With the exception that the actions are actually not represented as discrete tokens.

Dwarkesh Podcast
Fully autonomous robots are much closer than you think โ€“ Sergey Levine

It actually uses a flow matching kind of diffusion because they're continuous and you need to be very precise with your actions for dexterous control.

Dwarkesh Podcast
Fully autonomous robots are much closer than you think โ€“ Sergey Levine

Yeah, so one theme here that like I think is important to keep in mind is that

Dwarkesh Podcast
Fully autonomous robots are much closer than you think โ€“ Sergey Levine

The reason that those building blocks are so valuable is because the AI community has gotten a lot better at leveraging prior knowledge.

Dwarkesh Podcast
Fully autonomous robots are much closer than you think โ€“ Sergey Levine

And a lot of what we're getting from the pre-trained LLMs and VLMs is prior knowledge about the world.

Dwarkesh Podcast
Fully autonomous robots are much closer than you think โ€“ Sergey Levine

And it's a little bit abstracted knowledge.

Dwarkesh Podcast
Fully autonomous robots are much closer than you think โ€“ Sergey Levine

You can identify objects.

Dwarkesh Podcast
Fully autonomous robots are much closer than you think โ€“ Sergey Levine

You can figure out roughly where things are in an image, that sort of thing.