Dylan Patel
๐ค SpeakerAppearances Over Time
Podcast Appearances
I'm not doubling my hardware every two months.
Yes, this CapEx is crazy, but I'm not doubling my hardware every two months.
Yeah.
but I'm doubling my tokens every two months.
So there has to be enough of a cost decrease.
And there is at a given level of intelligence.
All of these things are curves and it's a trade-off, right?
Everything in engineering is a trade-off.
So you have inference latency versus cost on any given hardware.
GPUs can do lower latency to a certain extent, but then the cost is way higher.
Or you can do really, really high throughput and the cost is way lower.
And the company, they set the dial where they think it makes the most sense.
Yeah.
And there's other types of hardware which aim for their curve to be at a different spot.
Maybe the GPU curve is here, but latency, you know, over here, you know, you're in very diminishing returns.
And so actually someone made a little curve right here.
It's like, okay, maybe that's a useful point, but actually the market cares about this point.
So anyways, there's a curve of like, who cares about latency?
I think if I could just press a magic button, is it capacity?
Is it latency?