Azeem Azhar
👤 SpeakerAppearances Over Time
Podcast Appearances
The company wasn't formally acquired.
It's kind of complicated.
But that
Grok acquisition really, really pointed to the changing shape of the AI market.
Up until that point, NVIDIA had survived on a single, all the evolving architecture, that is the GPU, the graphics processing unit.
It was its heritage coming out of video games.
And GPUs are great at many things, but it had been coming clear over the last year or so
that they might not be fantastic for the changing shape of AI use as we move towards inference.
So now, this is a technical bit of my discussion.
So when you think about what happens in inference, I think it's worth just unpicking this because it'll explain what's going on.
There are a couple of phases.
The first phase is called pre-fill.
That's when you send a prompt to a model, whether it's a question or a document to summarize or some complex instruction.
The model reads and processes your input token simultaneously in parallel.
This is enormously compute intensive and it is where GPUs shine.
They were built for graphics.
throwing huge matrices of pixels at thousands of cores at once, doing it all in parallel.
And that is the shape of the pre-fill problem.
So when you're feeding context in, GPUs are doing what they were made to do.
But the second phase of inference is called decode, and that is the generation of the responses.