Nathaniel Whittemore
๐ค SpeakerAppearances Over Time
Podcast Appearances
Although, again, how it stacks up compared to Kimi K2.5, which we will talk about in our main episode, remains to be seen.
Lastly today, it's not just the Chinese labs with some interesting new product to show off.
Google has released a new feature for Gemini 3 Flash called Agentic Vision.
The feature leverages Gemini's state-of-the-art multimodal reasoning with code to execute unique capabilities.
Writes Google, Agentic Vision introduces an agentic think-act-observe loop into image understanding tasks.
the model analyzes the user query in the initial image, formulating a multi-step plan.
the model generates and executes Python code to actively manipulate images, such as cropping, rotating, or annotating, or analyzing them, such as running calculations, counting bounding boxes, etc.
Last is Observe.
The transformed image is appended to the model's context window.
This allows the model to inspect the new data with better context before generating a final response.
Overall, this promises to improve Gemini's ability to annotate images, perform data visualization tasks, help with basic image analysis.
Google said that the loop improves model performance by between 5 and 10% across most vision benchmarks.
Still developer experience lead Omar San Saviero hinted at the most exciting unlock from the new feature.
He showed an output of an annotated image of a table containing a spill.
Gemini had identified a spill, a piece of cloth, and several other items.
The annotations appear to be instructions for a robot to clean up the spill by first clearing away the items in the way, dampening the cloth, and wiping up the spill.
The implications, of course, being that this feature could be used to give robots on-the-fly analysis and reasoning ability, allowing them to tackle tasks that they've never seen before.
Ultimately though, as I said, when it comes to new models, the big conversation is around Kimi K2.5.
And so with that, we will wrap up the headlines and move on to the main episode.
Hello, friends.