Nathaniel Whittemore
๐ค SpeakerAppearances Over Time
Podcast Appearances
Now this was, of course, the big unlock from NanoBanana 2, but it seems like, as we'll see with some of the early examples, that this model takes it to the next level.
In terms of the capabilities they highlight, right at the top is greater precision and control.
They point out that it can do small text, iconography, tiny UI elements, dense compositions, and it can do so at significant resolution up to 2K.
The practical effect, they say, is instead of getting something vaguely in the neighborhood of what you meant, you get something you can actually use.
One example they gave of this is this pile of rice, with a tiny kernel in the middle that has a little bitty GPT image 2 written on it.
This model is multilingual, which they say not only helps with translation, but can also quote, generate visually coherent outputs where language is a part of the design itself, from posters and explainers to diagrams in comics.
Some of the other things they point out are much enhanced stylistic sophistication and better realism, including they point out adding tiny flaws that add realism, and they also discuss this idea of real-world intelligence, pointing out that it doesn't just mean it's cool, it unlocks a set of use cases like explainers, maps, educational graphics, and visual summaries, where, as they put it, correctness and clarity matter just as much as aesthetics.
Now, one last thing they note on the utility side of things is that they can now generate in more flexible sets of aspect ratios, which goes to just give people more fine-grained control.
Now in terms of community response, the first thing that I noticed is that a lot of people, including, by the way, Sam Altman himself in a recent interview, came into this new model effectively feeling like, for all intents and purposes, image generation was solved.
Or at least that if it could get better, it was just an incremental better that didn't really matter practically.
For many, that perspective has now been blown out of the water.
Ethan Malek writes, I didn't think that better image generators would be a big deal, but it turns out that there is a quality threshold I didn't expect where you can now get text slides in academic papers.
And as people dug in, there were a few themes that I saw over and over and over again.
The first was how much less AI-ness a lot of these photos had.
that you really could get not just pretty images, but very realistic-looking images, including things like not-so-great regular photography.
Pietro Serrano shared the output of a photo of a computer screen displaying a Spotify playlist at night, adding, "...it's an insane model and a true imagination engine with an incredible level of realism and small details."
Now, while the realism is impressive, a lot of people jump straight to the implications of the massively improved text and detail handling.
It unlocks things like entire comic panels.
And by the way, the ability to generate multiple images and to keep character consistency makes larger editorial generation along these lines much more possible as well.
The detail in text was showed off in all sorts of different ways.