Stefano Ermon
๐ค SpeakerAppearances Over Time
Podcast Appearances
So, you know, the agent can type, the agent can click, the agent can, you know, check out an item for you or can book a flight for you.
So it needs to take a bunch of actions, let's say on a website or maybe on your apps or on some apps on your phone, like depending on what's the environment.
Yeah, yeah, exactly.
So existing models, yeah, they take a mix of structured information about what kind of menus are available, where the buttons are.
and end images, and then they use that and they map it to an action.
So you can imagine something similar where instead you have a, you have a diffusion model, processes the same inputs, but then produces the answer, not one token at a time, but through this refinement process, which makes a lot of sense because people are already using diffusion policies and flow-based policies.
Like if you look at RL and robotics right now, the way
one of the best approaches for controlling robots and kind of like implementing the policy that decides what actions the robots take is based on diffusion.
It's based on flow models, more or less the same thing.
And so that's another kind of like data point that really excites me and gives me even more confidence that we are on the right track and we really need to do this at some point.
So those trade-offs change in the sense that at the end of the day, hallucinations happen because you're building a statistical model.
And so whenever you fit a statistical model to data,
there is a certain regime where maybe you're interpolating, but then you might need to extrapolate and then mistakes happen unless you have a perfect model, but perfect model is never actually possible.
And because you're fitting a different model, even if you use the same data, you're going to get a different kind of behavior.
So it's going to interpolate, it's going to extrapolate in different ways.
What we're seeing is that, you know, they still make mistakes.
If you try a mercury model, it's not perfect.
It does hallucinate.
But it does so in different ways.
And it's hard to quantify how.