Jonathan Ross
👤 PersonAppearances Over Time
Podcast Appearances
Because the other thing is, while it's sort of asymptotes, the question is, on this curve, where do you stop? It depends on how many people you have doing inference. You can either make the model bigger, which makes it more expensive, and then you train it on less. Or you make it smaller, and it's cheaper to run, but you have to train it more. So DeepSeq didn't have a lot of users until recently.
And so for them, it would have never made sense to train it a lot anyway. They would much rather have a bigger model. But now what you're going to see is all these other people either making smaller models or trying to make higher quality ones of the same size, but just training it more.
And so for them, it would have never made sense to train it a lot anyway. They would much rather have a bigger model. But now what you're going to see is all these other people either making smaller models or trying to make higher quality ones of the same size, but just training it more.
And so for them, it would have never made sense to train it a lot anyway. They would much rather have a bigger model. But now what you're going to see is all these other people either making smaller models or trying to make higher quality ones of the same size, but just training it more.
So they ran out of compute. And this is the other reason why chip startups are going to do just fine. because they ran out of inference compute. You train it once, but now... So you spend money to make the model, like designing a car, but then each car you build costs you money, right? Well, each query that you serve requires hardware. Training scales with the number of ML researchers you have.
So they ran out of compute. And this is the other reason why chip startups are going to do just fine. because they ran out of inference compute. You train it once, but now... So you spend money to make the model, like designing a car, but then each car you build costs you money, right? Well, each query that you serve requires hardware. Training scales with the number of ML researchers you have.
So they ran out of compute. And this is the other reason why chip startups are going to do just fine. because they ran out of inference compute. You train it once, but now... So you spend money to make the model, like designing a car, but then each car you build costs you money, right? Well, each query that you serve requires hardware. Training scales with the number of ML researchers you have.
Inference scales with the number of end users you have.
Inference scales with the number of end users you have.
Inference scales with the number of end users you have.
I think they marketed very well. Like you look at some of the publication and they make it sound like it's a philosophical thing. And, you know, they talk about they spent six million on the GPUs and everyone just zoomed in on that, neglecting the fact that Lama's first model was trained on like, I think, five million worth of GPU time. And it set the world on fire in a good way. And then
I think they marketed very well. Like you look at some of the publication and they make it sound like it's a philosophical thing. And, you know, they talk about they spent six million on the GPUs and everyone just zoomed in on that, neglecting the fact that Lama's first model was trained on like, I think, five million worth of GPU time. And it set the world on fire in a good way. And then
I think they marketed very well. Like you look at some of the publication and they make it sound like it's a philosophical thing. And, you know, they talk about they spent six million on the GPUs and everyone just zoomed in on that, neglecting the fact that Lama's first model was trained on like, I think, five million worth of GPU time. And it set the world on fire in a good way. And then
ignoring the fact that they spent a ton generating the data and all this. They're really good at marketing. I think they were probably surprised at how well it worked, but I think this is what they were going for.
ignoring the fact that they spent a ton generating the data and all this. They're really good at marketing. I think they were probably surprised at how well it worked, but I think this is what they were going for.
ignoring the fact that they spent a ton generating the data and all this. They're really good at marketing. I think they were probably surprised at how well it worked, but I think this is what they were going for.
What's up with the $500 billion Stargate effort?
What's up with the $500 billion Stargate effort?
What's up with the $500 billion Stargate effort?
I've gone back and forth on that. I actually did. So Gavin Baker tweeted some math. Before I saw that tweet, I came up with very similar math. However, talking to some people in the know, some of the comments are actually they've got it. But then you keep pressing and it's like, well, maybe is there some cutesiness to it?