Stefano Ermon
๐ค SpeakerAppearances Over Time
Podcast Appearances
We were matching the perplexity, but we were able to be like 10 times faster.
That was super exciting to me.
And I really wanted to see what happens if you train something bigger than a GPT-2 model, possible to build something commercially viable.
And that's why I started the company to scale things up.
The arithmetic intensity of inference workloads that we have today with an ultra aggressive model is very bad.
The utilization is very low and that's why people are building massive data centers or even building custom chips, AI inference chips that are better suited for that kind of work.
Basically, if you can generate more tokens per second,
What this means is that for the same amount of hardware, for the same number of GPUs, you can produce more tokens.
And so the cost per token is going to go down.
And that's why we're able to serve our models much more cheaply than what you would get because we make better use of the existing hardware.
So now the Mercury models that we have in production are significantly larger.
They've been trained on more data.
That's going to enable Mercury models to be even smarter.
It's going to have much better planning and kind of like reasoning capabilities.
And so that's going to enable a lot of
agentic use cases that people really care about.
They're going to make them really, really fast.
Thank you.
Pleasure to be here.
Good to see you again.