Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Stefano Ermon

๐Ÿ‘ค Speaker
359 total appearances

Appearances Over Time

Podcast Appearances

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

We figured out how to adapt the math and the underlying sort of like algorithms

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

from continuous spaces to discrete spaces like text and code.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

And we had some really promising results at the GPT-2 scale.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

So, you know, at Stanford, in university labs, you don't have access to a lot of GPU, a lot of compute.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

And so the largest model we were able to train was a GPT-2 sized model.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

But basically what we did is we took a GPT-2 sized model and we train it as a diffusion model and as an autoregressive model on the same data.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

And so what we found was that the quality was the same.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

So you were able to actually, you know, if you think about perplexity, which is the way people usually, um, you know, used to, to figure out how good the model fits the data, what's the quality of the generations you get.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

We were matching the perplexity, but we were able to be like 10 times faster.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

And so that was super exciting to me.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

And I really wanted to see what happens if you train something bigger than a GPT-2 model.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

you know, is it possible to build something commercially viable?

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

And that's why I started the company to scale things up.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

Since then, yes, yes.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

Now the Mercury models that we have in production are significantly larger.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

They've been trained on more data.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

There is a lot of engineering work that went into kind of like post-training the models and making sure that they would be useful for tasks that people care about, like commercial kind of use cases of LLMs.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

Yeah, so the models are still fairly large in terms of the number of parameters.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

We're still using actually similar architectures.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

So under the hood, it's still a transformer.