Sergey Levine
๐ค SpeakerAppearances Over Time
Podcast Appearances
But somehow information about the world needs to get injected into the system.
So โ and I think the way you pose this question actually elucidates this very nicely because โ
In robotics, classically, people have often thought about simulation as a way to inject human knowledge because a person knows how to write down differential equations.
They can code it up, and that gives the robot more knowledge than it had before.
But I think that increasingly what we're learning from experiences in other fields, from how the video generation stuff goes, from synthetic data for LLMs, is that actually probably the most powerful way to create synthetic experience is from a really good model.
Because, you know, the model probably knows more than a person does about those fine-grained details.
But then, of course, where does that model get the knowledge from experiencing the world?
So, in a sense, what you said, I think, is actually quite right in that a very powerful AI system can simulate a lot of stuff.
But also, at that point, it kind of almost doesn't matter because viewed as a black box, what's going on with that system is that information comes in and capability comes out.
And whether the way it processes that information is by imagining some stuff and simulating or by some model-free method is kind of irrelevant in understanding its capabilities.
Well, yeah, I mean, certainly when you sleep, your brain does stuff that looks an awful lot like what it does when it's awake, that looks an awful lot like playing back experience or perhaps generating new statistically similar experience.
And
So I think it's very reasonable to guess that perhaps simulation through a learned model is like part of how your brain figures out like counterfactuals basically.
But something that's kind of even more fundamental than that is that
Optimal decision-making at its core, regardless of how you do it, requires considering counterfactuals.
You basically have to ask yourself, if I did this instead of that, would it be better?
And you have to answer that question somehow.
And whether you answer that question by using a learned simulator or whether you answer that question by using a value function or something like that, by using a reward model, in the end, it's kind of all the same.
Like, as long as you have some mechanism for considering counterfactuals and figuring out which counterfactual is better, you've got it.
Yeah.