Lennart Heim
👤 PersonAppearances Over Time
Podcast Appearances
But there is a reason right now why NVIDIA is like so high on the stock market.
So it costs you billions to just get the GPUs.
Of course, it's better a lot of times you don't need to actually build your own cluster.
You can just go to a cloud computing company and rent the GPUs, right?
So you tell them, hey, could I get 30,000 GPUs, say, for six months?
If you do it this way, then current systems cost them like the three-digit millions for the computing power alone.
DeepSeek puts out this number.
It costs about $6 million.
This does only include the amortized cost for buying the GPUs.
It does not include the salaries.
It doesn't include all of your failures and experiments.
We at least know from one paper, I think it was the science paper where Hugging Face played a big role, where they said the total amount of compute spend was actually 3x the amount we spend on the final training round.
You make mistakes, you do de-risking training rounds.
I'm not going to spend $100 billion on a new architecture, which I haven't tested.
So I start with a small architecture, scale it up, see if it works.
That's the whole idea, right?
It's trial and error, and you throw a compute at it.
And someone will say, sure, why not?
Let's burn it, and hopefully we get a good system out of it.
So everything we just said before is still true.