Stefano Ermon

👤 Speaker

359 total appearances

Appearances Over Time

Podcast Appearances

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

So what we found is that that kind of architecture actually works pretty well also for, you know, as a backbone for a diffusion language model.

594.655 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

And so, you know, we've kind of used what we knew worked well and is well supported by existing, you know, frameworks and open source code.

604.824 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

And so, yeah, the neural networks are not that different.

613.712 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

It's just like they're trained in a different way and they're used in a different way at the inference time.

618.115 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

That's right.

633.979 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

Yes.

634.439 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

And I think, uh, perhaps that's actually suboptimal.

635 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

Like, uh, I mean, people have kind of converged on transformers as being a really good, uh, architecture for, you know, autoregressive models.

637.885 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

I mean, these days people also use them for diffusion models, like people use diffusion transformers.

648.2 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

So it's kind of like an architecture that it's widely used across different modalities, across different kinds of generative models.

652.776 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

But it's possible that there might be better architectures that shine even better once you change the generative model.

660.839 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

It's no longer autoregressive.

671.209 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

So I think the design space is different.

672.831 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

I think there is a lot more room for doing R&D and coming up with further improvements just by kind of matching the neural network architecture to the training objective and to the inference kind of computations that we do.

676.114 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

Right.

689.027 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

That makes sense.

689.307 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

Very cool.

689.668 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

Yeah, basically what's parallelized is that the network is able to essentially modify multiple tokens at the same time.

702.404 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

Wow.

710.934 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

And so that's kind of like what you were seeing.

711.835 View full episode →

← Previous Page 5 of 18 Next →

Report any issue