Dylan Patel
๐ค SpeakerAppearances Over Time
Podcast Appearances
There's a ton of like the data flywheel needs to get going there.
I think it's like tremendously hilarious that people are like, oh my God, this person's getting paid a billion dollars.
Or like, oh my God, this person's getting paid a hundred million dollars.
Hilarious to me that it is infeasible.
It's like, how could this person possibly be worth that much?
Well, they're running the experiments on chips that cost a hundred billion dollars.
If every wasted experiment they do, if they just used like a third of the compute and their ideas and their impact on it, wasted the compute was an idea that was already done or like there's so much wasted compute, call it wasted.
It's trying stuff and failing.
But like none of us know what to try and what not to try.
And these things are so complicated.
There's like a group of people just trying different stuff on the existing data.
How do you mix it?
What order do you feed it into the model?
How do you filter it?
There's a different group of people that are doing, what's the architecture?
There's different people working on long context.
There's different people working on every single aspect of the model that like, if you just make them a little bit more efficient, that they come up with the idea that's 5% more efficient.
Well, fantastic.
I just saved not only 5% of my training time.
I also saved 5% across my entire inference fleet because we're so far away from like these models being anywhere near as efficient as a human brain.