Dylan Patel
๐ค SpeakerAppearances Over Time
Podcast Appearances
And we know it can at least get as efficient as us.
Maybe the compute substrate isn't the same, but like whatever.
Adding more people to the problem doesn't make it faster.
Because there's so many things you're trying, learning the stuff from experiments is something that you run these experiments, you learn something, and then you implement it.
And then you try a bunch of experiments, right?
You tweak these knobs these ways in a hundred different ways.
And then you see the trend line and you're like, oh, so actually I should tweak it this way.
Let's implement that.
There's so much like gut feel.
There's so much reading data, understanding it, re-implementing it into these things that if you add people, you're going to slow it down.
And in a sense, a lot of Meta's problems before they did the super intelligence thing was that they just had too many people that weren't led by leadership.
That was amazing.
And they had a lot of failed experiments and wasted time doing things that didn't matter.
There's a tweet from one of my friends at OpenAI.
He's pretty famous on Twitter.
His name's Rune.
He made a tweet about like, I get viscerally angry every time I think about how many H100s Meta's wasting.
It's like, it's such a funny tweet because it's like, well, yeah, they're wasting a ton of compute.
They were, you know, maybe they still are, but you know, like everyone's wasting compute, right?
OpenAI is wasting tons of compute because what's the Pareto optimal model architecture in the house?