Gavin Baker
๐ค SpeakerAppearances Over Time
Podcast Appearances
My firm is an investor in XAI.
My firm is an investor in XAI.
I would just say we're already building models of models, you know, almost every application startup I'm aware of is chaining models. You know, you start with the cheap model, you check the cheap models work with a more expensive model, you know, lots of very clever things are being done. You know, every, every AI application company has what's called a router.
I would just say we're already building models of models, you know, almost every application startup I'm aware of is chaining models. You know, you start with the cheap model, you check the cheap models work with a more expensive model, you know, lots of very clever things are being done. You know, every, every AI application company has what's called a router.
So they can, you know, swap out the underlying model if another one is better for the task at hand. Um, As far as what the wall is, there's been a big debate that we were hitting a wall on these scaling laws and that scaling laws were breaking down. And I just thought that was deeply silly because no one had built a cluster bigger than, you know, 32,000 H-100s. And nobody knew.
So they can, you know, swap out the underlying model if another one is better for the task at hand. Um, As far as what the wall is, there's been a big debate that we were hitting a wall on these scaling laws and that scaling laws were breaking down. And I just thought that was deeply silly because no one had built a cluster bigger than, you know, 32,000 H-100s. And nobody knew.
It was a ridiculous debate. And there were really smart people on both sides. But there's no new data. Grok 3 is the first new data point to support whether or not scaling laws are breaking or holding. Because no one else thought you could make 100,000 hoppers coherent. And I think based on public reports, they're going to 200,000 hoppers. And then the next tick is a million.
It was a ridiculous debate. And there were really smart people on both sides. But there's no new data. Grok 3 is the first new data point to support whether or not scaling laws are breaking or holding. Because no one else thought you could make 100,000 hoppers coherent. And I think based on public reports, they're going to 200,000 hoppers. And then the next tick is a million.
It was reported they're going to be first in line for Blackwell. But Grok Street is a big card and will resolve this question of whether or not we're hitting a wall. The other question you raise, David, is very interesting. And by the way, we should note there is now a new axis of scaling. Some people call it test time compute. Some people call it infrared scaling.
It was reported they're going to be first in line for Blackwell. But Grok Street is a big card and will resolve this question of whether or not we're hitting a wall. The other question you raise, David, is very interesting. And by the way, we should note there is now a new axis of scaling. Some people call it test time compute. Some people call it infrared scaling.
And basically the way this works, you just think of these models as human. The more you speak to one of these models, the way you'd speak to your 17-year-old going off to take the SAT, the better it will do for you. As a human, if I ask you, David, what's 2 plus 2? Four flashes in your mind right away.
And basically the way this works, you just think of these models as human. The more you speak to one of these models, the way you'd speak to your 17-year-old going off to take the SAT, the better it will do for you. As a human, if I ask you, David, what's 2 plus 2? Four flashes in your mind right away.
If I ask you to unify a grand unified theory of physics that accounts for both quantum mechanics and relativistic physics, you will think for a lot longer. 147. Yeah, nobody knows. We have been giving these models the same amount of time to think, no matter how complicated the question was.
If I ask you to unify a grand unified theory of physics that accounts for both quantum mechanics and relativistic physics, you will think for a lot longer. 147. Yeah, nobody knows. We have been giving these models the same amount of time to think, no matter how complicated the question was.
What we've now learned is if you let them think for longer about more complex questions, test time compute, you can dramatically improve their IQ. So we're just at the beginning of this new scaling law. But I think the question you raise on ROI is very good. And I'm happy to address it.
What we've now learned is if you let them think for longer about more complex questions, test time compute, you can dramatically improve their IQ. So we're just at the beginning of this new scaling law. But I think the question you raise on ROI is very good. And I'm happy to address it.
Oh yeah, we have, even if scaling laws for training break, we have another decade of innovation ahead of us. Exactly.
Oh yeah, we have, even if scaling laws for training break, we have another decade of innovation ahead of us. Exactly.
Oh, absolutely not.
Oh, absolutely not.