Alex Imas
π€ SpeakerAppearances Over Time
Podcast Appearances
Right, exactly.
Yeah.
Google recently announced Gemini Omni and its video editing capabilities are incredible.
You can upload a video and then tell Omni to do things like change the background or adjust the lighting or add or remove elements, all while keeping everything else consistent.
But Omni isn't just a video editor.
I got a chance to sit down with the research and product team behind Omni and I learned that it's a preview of how future frontier models will be trained.
It can take in any kind of input, whether that's text or audio or video.
And while it doesn't currently do so, architecturally, it's capable of just as seamlessly outputting images or text.
So it's really a bet on the multimodal data transfer hypothesis.
The model becomes better at predicting one data type by seeing the others.
For example, Omni is really good at accurately rendering text on video, even though Google didn't specifically target that capability in this model.
And Omni is the next step towards more accurate world models.
Because in order to predict the next frame of a video, you have to have a deep understanding of physics and spatial dynamics.
As Omni progresses, it'll be interesting to see whether it can close a sim-to-real gap.
Because it's much harder to collect data in the real world than it is in simulation, robotics progress has lagged other applications of AI.
But if you have really good video models that can simulate reality, maybe that stops being the case.
In the meantime, if you want to try Omni, you can check it out in the Gemini app at gemini.google or use it in Google's AI creative studio, Flow, at flow.google.
We were talking a second ago about why there isn't more automation as a result of LLMs.
And one plausible mechanism could be that, as you were saying with the O-ring, so O-ring theory refers to this fact that the Challenger shuttle blew up because there's one component that malfunctioned and it destroyed the whole thing.
And maybe that's a more general model of how goods are produced in the economy, that you've got to make sure everything is reliable and works well.