Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Stefano Ermon

๐Ÿ‘ค Speaker
359 total appearances

Appearances Over Time

Podcast Appearances

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

So what we found is that that kind of architecture actually works pretty well also for, you know, as a backbone for a diffusion language model.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

And so, you know, we've kind of used what we knew worked well and is well supported by existing, you know, frameworks and open source code.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

And so, yeah, the neural networks are not that different.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

It's just like they're trained in a different way and they're used in a different way at the inference time.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

That's right.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

Yes.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

And I think, uh, perhaps that's actually suboptimal.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

Like, uh, I mean, people have kind of converged on transformers as being a really good, uh, architecture for, you know, autoregressive models.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

I mean, these days people also use them for diffusion models, like people use diffusion transformers.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

So it's kind of like an architecture that it's widely used across different modalities, across different kinds of generative models.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

But it's possible that there might be better architectures that shine even better once you change the generative model.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

It's no longer autoregressive.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

So I think the design space is different.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

I think there is a lot more room for doing R&D and coming up with further improvements just by kind of matching the neural network architecture to the training objective and to the inference kind of computations that we do.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

Right.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

That makes sense.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

Very cool.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

Yeah, basically what's parallelized is that the network is able to essentially modify multiple tokens at the same time.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

Wow.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

And so that's kind of like what you were seeing.