Menu
Sign In Search Podcasts Libraries Charts People & Topics Add Podcast API Blog Pricing
Podcast Image

The a16z Show

AI, Design, and the Power of Open Models

15 Jun 2026

Transcription

Transcript generated automatically by AI and may contain errors.

Chapter 1: What are image generation models and their significance?

5.667 - 20.123 Mohammad Norouzi

It's not about how good a model is in the general sense. It's about how good is this model for my use case. For a lot of design and marketing use cases, we need editable design, not a single flat image.

0

20.143 - 27.912 Justine Moore

It's super impressive, honestly, reaching the level of things like nano banana or GPT image with an open source model. Why did you think that was important?

0

27.892 - 43.099 Mohammad Norouzi

We really want our models to have taste. Every artist, they can really customize this model to the nuances of their style, the texture of their canvas, and really get 2K output and hopefully make that part of their workflow.

0

43.079 - 56.743 Yoko Li

One thing we were always wondering is that this release, the open source model is so small. It's 9.3 billion parameters. Like previously, a SOTA is probably like 80 billion parameters. It's like 9x the difference. How did you do it?

0

57.224 - 78.153 Chris Duffy

We focused on... Image generation has improved dramatically over the last few years. The next challenge is not simply creating images, but giving users more control over what gets created and how. That includes everything from typography and layouts to editing, customization, and workflows that fit into professional creative processes.

Chapter 2: Why did Ideogram decide to release an open-weight model?

79.014 - 90.889 Chris Duffy

Yoko Lee and Justine Moore speak with Ideogram founder and CEO Mohamed Nourouzi about image generation, open-weight models, design tools, and the future of creative AI.

0

93.417 - 104.529 Justine Moore

So today we're excited to have Mohamed, CEO and founder of Ideagram, a Toronto-based generative AI company that just released their first open weights image model. Congrats on the huge release.

0

104.77 - 106.892 Mohammad Norouzi

Congratulations. Thanks for having me.

0

106.912 - 119.526 Justine Moore

We're really excited to talk through something that everyone has been buzzing about, which is the fact that the model is open weights. The previous Ideagram models have been closed source, so would love to hear how you made the call to make it open this time.

0

119.506 - 146.94 Mohammad Norouzi

What has happened is there has been a lot of progress in the industry, and we used to do everything. Basically, we had our own first-party app as well as our own first-party API, and model development itself is a lot of work. And we decided to focus a little more on the model side. We think that's where a lot of potential exists. We still want to continue to own the interaction with the users.

147 - 163.134 Mohammad Norouzi

We think there is a lot of importance Feedback we can get from the users directly, but then we want to focus more on building the model. And by releasing the weights, we're actually extending ourselves and working with inference providers, working more directly with large enterprise.

Chapter 3: How do AI models improve controllability in design?

163.676 - 180.642 Mohammad Norouzi

They have every ability to customize the models or host it on-prem or optimize it for device. And we would love to work with the best chip makers to really optimize the model, the best inference providers.

0

181.022 - 194.282 Mohammad Norouzi

So this is basically us saying, hey, we are very serious about building the foundation model and we would like to work with you wherever you are, whether you're an app developer or a chip maker or an inference provider.

0

194.752 - 214.914 Yoko Li

I think you already kind of touched on this. The new open source model is very exciting in that it unlocked a lot of new use cases. It's very photorealistic. I think it can generate up to 2K with a smaller model too. Obviously, there's very precise layout control as well. Do you want to talk about some of the net new use cases that's unlocked by this model?

0

214.934 - 237.499 Mohammad Norouzi

So this is actually a foundation based on which we're going to release some more exciting features next. So this is just the first release. We're just testing the waters, figuring out how to work with Hugging Face and the open source community, comfy UI, et cetera. What I'm personally most excited about is something we haven't released yet, which is editable text and layout control.

0

Chapter 4: What challenges exist in generating text and layouts within images?

237.479 - 260.905 Mohammad Norouzi

And I really believe for a lot of design and marketing use cases, we need editable design, not a single flat image. And we haven't released that yet. We cannot show the teaser in our video, but I'm personally most excited about that. On the technical side, what we've done is we went really detailed on the prompting.

0

261.458 - 288.732 Mohammad Norouzi

And if you look at our prompts, it's like thousands of words, each element in the image, where it is in the image. We have layout control, bounding box, and a number of elements. And that's one of the key innovations here that unlocks a lot of, again, design use cases because you clearly want font control, you want layout control. And this model is very versatile, allows you to really find

0

288.712 - 296.224 Mohammad Norouzi

fix certain elements, fix positioning and control the image generation in every detail possible.

0

297.267 - 312.066 Justine Moore

Amazing. And one of the other things I immediately noticed about the model was how you can render super long texts, like paragraphs of texts completely accurately, which you either give the model in the prompt or you ask the model to come up with something and it does it really well.

0

312.627 - 323.882 Justine Moore

And it's super impressive, honestly, reaching the level of things like nano banana or GPT image with an open source model. Was that something you guys really focused on? And sort of why did you think that was important?

323.997 - 328.284 Mohammad Norouzi

I don't know if you remember, but the very first model we released three years ago.

Chapter 5: How can customization and editing enhance creative workflows?

328.304 - 350.157 Mohammad Norouzi

And at the time, image generation was synonymous with garbled text. And there were memes about Dolly 2 generating travel posters with incorrect city names, which is fun to look at. So I remember at the time, we were just a few people building these models. And the question was, how can we differentiate? What's unique about our model?

0

350.137 - 372.982 Mohammad Norouzi

And he said, okay, text generation, accurate text is something we have. And then we released it and we were really surprised. It's just so many people were so excited about text generation. And then we realized, oh, actually, that's the whole graphic design and storytelling industry. Text is a very important part of image generation. And that became a very important part of our brand.

0

373.002 - 388.388 Mohammad Norouzi

So if you search Idogram, people talk about the quality of typography, the quality of text. We are known for typography. really stylized typography for logo, t-shirt design, graphic design in general. And so we continue to push forward.

0

Chapter 6: What trade-offs exist between general-purpose and specialized models?

388.428 - 405.017 Mohammad Norouzi

I think our previous model wasn't really beating the state of the art in text generation, but we continue to focus on that. And we had a bunch of research breakthroughs. And with this model, despite the fact that it's very tiny, the text generation is very, very accurate.

0

405.841 - 426.625 Yoko Li

One of the things that stood out to us, which is what the community has been chatting about, is how there's new ways of processing data as you're training the model, which is like you kind of let the model learn what is a bounding box and how to do the layering and color palettes. Do you want to talk more about some of the innovations you had during the training process?

0

427.146 - 430.87 Yoko Li

What made this model so good with these differentiating features?

0

431.305 - 453.951 Mohammad Norouzi

Yeah, it's kind of difficult to exactly describe what resulted in such an amazing model. I think a lot of it is focus and evaluation. Evaluating image models is actually a very difficult thing to do. There are lots of benchmarks out there, but People look at them and they're like, okay, this doesn't correlate with pixel fidelity that I care about or realism.

0

453.991 - 472.942 Mohammad Norouzi

You don't really want Novice users to judge the quality of these models because they may be looking at small monitors that aren't really adjusted for color accuracy. And they always care so much about quality for the realism and Again, text accuracy.

472.962 - 485.653 Mohammad Norouzi

So throughout training, we always measure text accuracy and we update very detailed changes to the model and data and see how that resolves in performance.

Chapter 7: How does the model ensure quality in typography and text generation?

485.974 - 511.24 Mohammad Norouzi

So I would say A lot of it is really listing all the possible changes and very carefully tuning each element of the model and see what happens. Obviously, we try to gather as much data as possible. One of the standard recipes in the industry is that We take images and we turn them to text using visual language models.

0

511.941 - 535.972 Mohammad Norouzi

The very first models we were training three, four years ago would be based on the alt text that you can find on the internet. That is, each image on the internet may have an alt text field associated with it, which describes what's in the image. But the problem is the alt text is often very short or inaccurate. And what we do now is we train models to go from image to text.

0

535.992 - 556.631 Mohammad Norouzi

And in this case, image to text with detailed bounding box information, detailed element information. If we hear about text and we really want to make sure all the text in the image is correctly described, And then we go from text to image backwards. It's kind of interesting. We gather all the images from the internet. Some of them may have alt text, some of them may not have alt text.

0

Chapter 8: What future developments can we expect from image generation models?

556.651 - 567.591 Mohammad Norouzi

And then we use AI to go from image to text. And then we train another AI model to go from text to image. So that's one of the key recipes that results in very good models.

0

567.892 - 588.032 Yoko Li

I saw a lot of JSON prompting in your technical blog, which is very unique. And as I was trying to model, it seems like it was translating the text, the prompt, to a JSON representation with implicit structure. Right. Do you think JSON is a representation for image models going forward, or do you think there's another representation there?

0

588.052 - 595.38 Mohammad Norouzi

It's a very good question. I don't know if you've seen the open source community is a little upset because of the safety image that shows up.

0

595.4 - 599.214 Justine Moore

No, no. They always are.

0

599.255 - 606.643 Mohammad Norouzi

Reddit was really lashing out at our engineers. And one of our people said, oh, we might fix this. And they were like, we might?

607.285 - 607.787

Oh, wow.

609.758 - 635.479 Mohammad Norouzi

But the fact is the community needs to also read the documentation and bear with us. This model is only trained with JSON prompting. And you have to provide JSON with that particular structure for you to get good quality output. So I don't know if this is a feature or bug. We did have some safety built into the model, but that is also detecting incorrect prompts.

635.459 - 662.31 Mohammad Norouzi

So if you just give it a one-word prompt, then you get this image is blocked by safety image back. But that's because your prompt is not a well-specified JSON. Now, we don't want people to write in JSON. We don't think that's a natural way of interacting with these models. But I do strongly believe that we need to use all the AI innovation to build the best models.

662.29 - 686.923 Mohammad Norouzi

image generation and editing models. And there had been a lot of progress in language models in the text space. So the question is, if you want to go from some vague idea to an image, what's the exact process? How much of the thinking happens in the language space and how much of the thinking happens in the actual kind of pixel generation space?

Comments

There are no comments yet.

Please log in to write the first comment.