Yoko Li
π€ SpeakerVoice Profile Active
This person's voice can be automatically recognized across podcast episodes using AI voice matching.
Appearances Over Time
Podcast Appearances
One thing we were always wondering is that this release, the open source model is so small.
It's 9.3 billion parameters.
Like previously, a SOTA is probably like 80 billion parameters.
It's like 9x the difference.
How did you do it?
I think you already kind of touched on this.
The new open source model is very exciting in that it unlocked a lot of new use cases.
It's very photorealistic.
I think it can generate up to 2K with a smaller model too.
Obviously, there's very precise layout control as well.
Do you want to talk about some of the net new use cases that's unlocked by this model?
One of the things that stood out to us, which is what the community has been chatting about, is how there's new ways of processing data as you're training the model, which is like you kind of let the model learn what is a bounding box and how to do the layering and color palettes.
Do you want to talk more about some of the innovations you had during the training process?
What made this model so good with these differentiating features?
I saw a lot of JSON prompting in your technical blog, which is very unique.
And as I was trying to model, it seems like it was translating the text, the prompt, to a JSON representation with implicit structure.
Do you think JSON is a representation for image models going forward, or do you think there's another representation there?
And for people who want control or like consistency, I think that'd be key.