Jensen Huang
๐ค SpeakerVoice Profile Active
This person's voice can be automatically recognized across podcast episodes using AI voice matching.
Appearances Over Time
Podcast Appearances
You have to shard the data.
You have to shard the model.
Now, all of a sudden, when you distribute the problem this way, not just scaling up the problem, but you're distributing the problem.
then everything gets in the way.
This is the Amdahl's law problem, where the amount of speed up you have for something depends on how much of the total workload it is.
And so if computation represents 50% of the problem, and I sped up computation infinitely, like a million times, I only sped up the total workload by a factor of two.
Now, all of a sudden, not only do you have to distribute the computation, you have to shard the pipeline somehow, you also have to solve the networking problem because you've got all of these computers are all connected together.
Distributed computing at the scale that we do, the CPU is a problem, the GPU is a problem, the networking is a problem, the switching is a problem.
and distributing the workload across all these computers are a problem.
It's just a massively complex computer science problem.
And so we just got to bring every technology to bear.
Otherwise,
We scale up linearly, or we scale up based on the capabilities of Moore's Law, which has largely slowed because Dennard's scaling has slowed.
There's the first question, which is, what is extreme co-design?
You were optimizing across the entire stack of software, from architectures to chips to systems to system software to the algorithms to the applications.
That's one layer.
The second thing that you and I just talked about goes beyond that.
cpus and gpus and networking chips and scale up switches and scale out switches and then of course you got to include power and cooling and all of that because you know all these computers are extremely extremely power power hungry they do a lot of work and they're very energy efficient but they in aggregate still consume a lot of power and so that's one the first question is what is it
The second question is, why is it?
And we just spoke about the reason you want to distribute the workload so that you can exceed the benefit of just increasing the number of computers.