Stefano Ermon

So you were able to actually, you know, if you think about perplexity, which is the way people usually, um, you know, used to, to figure out how good the model fits the data, what's the quality of the generations you get.

515.202 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

We were matching the perplexity, but we were able to be like 10 times faster.

526.615 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

And so that was super exciting to me.

531 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

And I really wanted to see what happens if you train something bigger than a GPT-2 model.

533.342 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

you know, is it possible to build something commercially viable?

537.854 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

And that's why I started the company to scale things up.

540.797 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

Since then, yes, yes.

548.886 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

Now the Mercury models that we have in production are significantly larger.

549.927 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

They've been trained on more data.

553.291 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

There is a lot of engineering work that went into kind of like post-training the models and making sure that they would be useful for tasks that people care about, like commercial kind of use cases of LLMs.

555.012 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

Yeah, so the models are still fairly large in terms of the number of parameters.

580.468 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

We're still using actually similar architectures.

587.399 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

So under the hood, it's still a transformer.

590.544 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment