Chris Lattner
๐ค SpeakerAppearances Over Time
Podcast Appearances
Well, this is actually a small version of the same problem you have in a data center.
In a data center, you now have multiple different machines, sometimes very specialized, sometimes with GPUs or TPUs in one node and sometimes with disks in another node.
And so you get a much larger scale heterogeneous computer.
And so what ends up happening is you have this like multi-layer abstraction of hierarchical parallelism, hierarchical parallelism.
asynchronous communication and making that, again, my enemy is complexity.
By getting that away from being different specialized systems at every different part of the stack and having more consistency and uniformity, I think we can help lift the world and make it much simpler and actually get used.
Yeah, so there's a pretty well-known algorithm, and what you're doing is you're looking at two factors.
You're looking at the factor of sending data from one thing to another, because it takes time to get it from that side of the chip to that side of the chip, and things like this.
And then you're looking at what is the time it takes to do an operation on a particular block.
So take CPUs.
CPUs are...
fully general, they can do anything, right?
But then you have a neural net accelerator that's really good at matrix multiplications, okay?
And so you say, okay, well, if my workload is all matrix multiplications, I start up, I send the data over the neural net thing, it goes and does matrix multiplications, when it's done, it sends me back the result, all is good, right?
And so the simplest thing is just saying, do matrix operations over there, right?
But then you realize you get a little bit more complicated because you can do matrix multiplications on a GPU.
You can do it on a neural net accelerator.
You can do it on CPU and they'll have different trade-offs and costs.
And it's not just matrix multiplication.
And so what you actually look at is you look at