Jensen Huang
๐ค SpeakerVoice Profile Active
This person's voice can be automatically recognized across podcast episodes using AI voice matching.
Appearances Over Time
Podcast Appearances
And so that balance, incredible balance between specialization, otherwise we can't accelerate the CPU, versus generalization so that we can adapt with changing algorithms, that's really, really important.
That's the reason why CUDA has been so resilient
on the one hand, and yet we continue to enhance it.
We're at CUDA 13.2, and so we're evolving the architecture so fast that we can stay with the modern algorithms.
For example,
When mixture experts came out, that's the reason why we had MVLink 72 instead of MVLink 8.
We could now take an entire 4 trillion, 10 trillion parameter model and put it in one computing domain as if it's running on one GPU.
people probably didn't notice I said it but if you look at the architecture of the Grace Blackwell racks it was completely focused on doing one thing processing the LLM all of a sudden one year later
you're looking at a Vera Rubin rack.
It has storage accelerators.
It has this incredible new CPU called Vera.
It has Vera Rubin and MVLINK72 to run the LLMs.
It also has this new additional rack called Grok.
And so this entire rack system is completely different
than the previous one.
And it's got all these new components in it.
And the reason for that is because the last one was designed to run MOE, large language models, inference.
And this one is to run agents.
And agents bang on tools.
No, it's easier than that.