Eiso Kant
👤 PersonAppearances Over Time
Podcast Appearances
We are taking huge amounts of data and we're forcing this generalization of learning to happen in a very small space. And this is why we essentially see these difference in capabilities. Larger models require, essentially, it's easier for them to generalize because we're not forcing so much data into such a small compression space.
And so my personal mental model of this is that the scale of your models, this has been shown over and over again, by the way, we owe a depth of gratitude to Google, to OpenAI, proving out the scaling laws, which essentially say as we provide more data and more parameters, more skill, hence more compute for these models, we get more and more capable models.
And so my personal mental model of this is that the scale of your models, this has been shown over and over again, by the way, we owe a depth of gratitude to Google, to OpenAI, proving out the scaling laws, which essentially say as we provide more data and more parameters, more skill, hence more compute for these models, we get more and more capable models.
And so my personal mental model of this is that the scale of your models, this has been shown over and over again, by the way, we owe a depth of gratitude to Google, to OpenAI, proving out the scaling laws, which essentially say as we provide more data and more parameters, more skill, hence more compute for these models, we get more and more capable models.
Now there is a limit to that most likely. If you think about it as the analogy to compression, your image that you had, you know, at high resolution compressed down to small resolution. The small models are the small resolution. We have generalization, but you're losing things. But in the infinite extreme, an infinitely large model wouldn't be doing any compression.
Now there is a limit to that most likely. If you think about it as the analogy to compression, your image that you had, you know, at high resolution compressed down to small resolution. The small models are the small resolution. We have generalization, but you're losing things. But in the infinite extreme, an infinitely large model wouldn't be doing any compression.
Now there is a limit to that most likely. If you think about it as the analogy to compression, your image that you had, you know, at high resolution compressed down to small resolution. The small models are the small resolution. We have generalization, but you're losing things. But in the infinite extreme, an infinitely large model wouldn't be doing any compression.
So there is definitely a limit at some point to model size. But what underpins all of this, to directly answer your question, is the compute. And the compute really, really matters. Your own proprietary advantages in your applied research to get great data or to gather it matter equally as much. But if you don't have the compute, you're not in the race.
So there is definitely a limit at some point to model size. But what underpins all of this, to directly answer your question, is the compute. And the compute really, really matters. Your own proprietary advantages in your applied research to get great data or to gather it matter equally as much. But if you don't have the compute, you're not in the race.
So there is definitely a limit at some point to model size. But what underpins all of this, to directly answer your question, is the compute. And the compute really, really matters. Your own proprietary advantages in your applied research to get great data or to gather it matter equally as much. But if you don't have the compute, you're not in the race.
We are horribly inefficient at learning today. If you think about what drives efficiency of learning, it's the algorithms and it's the hardware itself. We've got probably decades, if not hundreds of years of improvements still left there and different forms of it over time. If we look very practically in the coming years, we are going to see increasing advantages on the hardware.
We are horribly inefficient at learning today. If you think about what drives efficiency of learning, it's the algorithms and it's the hardware itself. We've got probably decades, if not hundreds of years of improvements still left there and different forms of it over time. If we look very practically in the coming years, we are going to see increasing advantages on the hardware.
We are horribly inefficient at learning today. If you think about what drives efficiency of learning, it's the algorithms and it's the hardware itself. We've got probably decades, if not hundreds of years of improvements still left there and different forms of it over time. If we look very practically in the coming years, we are going to see increasing advantages on the hardware.
I'm going to see increasing advantages on the algorithms. But I hope everyone takes away that this is table stakes. This is something that you have to do to be in this space and you have to be excellent at it. It's not what differentiates you, it's what allows you to keep up with everyone else.
I'm going to see increasing advantages on the algorithms. But I hope everyone takes away that this is table stakes. This is something that you have to do to be in this space and you have to be excellent at it. It's not what differentiates you, it's what allows you to keep up with everyone else.
I'm going to see increasing advantages on the algorithms. But I hope everyone takes away that this is table stakes. This is something that you have to do to be in this space and you have to be excellent at it. It's not what differentiates you, it's what allows you to keep up with everyone else.
I think the biggest cognitive dissonance that people have around synthetic data is a model is generating data to then actually become smarter itself, right? It feels like a snake eating itself. There's something that doesn't make sense in it. Now, the way that you need to look at that is that there's actually another step in that loop.
I think the biggest cognitive dissonance that people have around synthetic data is a model is generating data to then actually become smarter itself, right? It feels like a snake eating itself. There's something that doesn't make sense in it. Now, the way that you need to look at that is that there's actually another step in that loop.
I think the biggest cognitive dissonance that people have around synthetic data is a model is generating data to then actually become smarter itself, right? It feels like a snake eating itself. There's something that doesn't make sense in it. Now, the way that you need to look at that is that there's actually another step in that loop.
There's something that determines if from all the data that the model generated in my domain and software development, I have a task in a code base and the model generates 100 different solutions. If I would just feed those hundred different solutions back to the model in its training, the model won't get smarter. That's the snake eating itself.