Mohammad Norouzi
π€ SpeakerVoice Profile Active
This person's voice can be automatically recognized across podcast episodes using AI voice matching.
Appearances Over Time
Podcast Appearances
release some more exciting features next.
So this is just the first release.
We're just testing the waters, figuring out how to work with Hugging Face and the open source community, comfy UI, et cetera.
What I'm personally most excited about is something we haven't released yet, which is editable text and layout control.
And I really believe for a lot of design and marketing use cases, we need editable design, not a single flat image.
And we haven't released that yet.
We cannot show the teaser in our video, but I'm personally most excited about that.
On the technical side, what we've done is we went really detailed on the prompting.
And if you look at our prompts, it's like thousands of words, each element in the image, where it is in the image.
We have layout control, bounding box, and a number of elements.
And that's one of the key innovations here that unlocks a lot of, again, design use cases because you clearly want font control, you want layout control.
And this model is very versatile, allows you to really find
fix certain elements, fix positioning and control the image generation in every detail possible.
I don't know if you remember, but the very first model we released three years ago.
And at the time, image generation was synonymous with garbled text.
And there were memes about Dolly 2 generating travel posters with incorrect city names, which is fun to look at.
So I remember at the time, we were just a few people building these models.
And the question was, how can we differentiate?
What's unique about our model?
And he said, okay, text generation, accurate text is something we have.