Eiso Kant
👤 PersonAppearances Over Time
Podcast Appearances
There's something that determines if from all the data that the model generated in my domain and software development, I have a task in a code base and the model generates 100 different solutions. If I would just feed those hundred different solutions back to the model in its training, the model won't get smarter. That's the snake eating itself.
There's something that determines if from all the data that the model generated in my domain and software development, I have a task in a code base and the model generates 100 different solutions. If I would just feed those hundred different solutions back to the model in its training, the model won't get smarter. That's the snake eating itself.
But if you have something that can determine an oracle of truth that can help say, this is better and this is worse, or this is correct and this is wrong, that's when you can actually use synthetic data.
But if you have something that can determine an oracle of truth that can help say, this is better and this is worse, or this is correct and this is wrong, that's when you can actually use synthetic data.
But if you have something that can determine an oracle of truth that can help say, this is better and this is worse, or this is correct and this is wrong, that's when you can actually use synthetic data.
We are starting to understand that the scaling, the first version of the scaling laws that came out spoke about the amount of data we provided during training and the size of the model. More data, longer training and size of the model larger requires more compute. And so we often say the scaling laws are about applying more compute. And it's actually more correct than we initially realized because
We are starting to understand that the scaling, the first version of the scaling laws that came out spoke about the amount of data we provided during training and the size of the model. More data, longer training and size of the model larger requires more compute. And so we often say the scaling laws are about applying more compute. And it's actually more correct than we initially realized because
We are starting to understand that the scaling, the first version of the scaling laws that came out spoke about the amount of data we provided during training and the size of the model. More data, longer training and size of the model larger requires more compute. And so we often say the scaling laws are about applying more compute. And it's actually more correct than we initially realized because
The importance of synthetic data for models to get better is another form of using compute, but we're using it at inference time. We're running these models to generate these 100 solutions, generate 1,000 or 100 or 50. I think we have a lot of room still for scaling up models. We can do this by scaling up data, and we can do this by scaling up the size of the model.
The importance of synthetic data for models to get better is another form of using compute, but we're using it at inference time. We're running these models to generate these 100 solutions, generate 1,000 or 100 or 50. I think we have a lot of room still for scaling up models. We can do this by scaling up data, and we can do this by scaling up the size of the model.
The importance of synthetic data for models to get better is another form of using compute, but we're using it at inference time. We're running these models to generate these 100 solutions, generate 1,000 or 100 or 50. I think we have a lot of room still for scaling up models. We can do this by scaling up data, and we can do this by scaling up the size of the model.
Now, our opinion is that there's a lot of room to scale the number of parameters and size of models still. But there's something that we don't really talk about in our industry as much. We're training extremely large models. And by the way, we until very recently weren't even capable of doing so because we didn't have the compute and the capital.
Now, our opinion is that there's a lot of room to scale the number of parameters and size of models still. But there's something that we don't really talk about in our industry as much. We're training extremely large models. And by the way, we until very recently weren't even capable of doing so because we didn't have the compute and the capital.
Now, our opinion is that there's a lot of room to scale the number of parameters and size of models still. But there's something that we don't really talk about in our industry as much. We're training extremely large models. And by the way, we until very recently weren't even capable of doing so because we didn't have the compute and the capital.
This is why our fundraise has been so important to us so that we can have the capital to scale up. But what everyone does is that extremely large models, we can run cost efficiently for our end users.
This is why our fundraise has been so important to us so that we can have the capital to scale up. But what everyone does is that extremely large models, we can run cost efficiently for our end users.
This is why our fundraise has been so important to us so that we can have the capital to scale up. But what everyone does is that extremely large models, we can run cost efficiently for our end users.
You have a multi-trillion parameter model that is what we often, you know, often architected as an MOE, meaning that not all of those parameters activate during inference time, but they're still very large. It's too expensive. Every request that you make to that model is not a couple of cents. And so you have to find a way to actually build models that you can actually run for customers.
You have a multi-trillion parameter model that is what we often, you know, often architected as an MOE, meaning that not all of those parameters activate during inference time, but they're still very large. It's too expensive. Every request that you make to that model is not a couple of cents. And so you have to find a way to actually build models that you can actually run for customers.
You have a multi-trillion parameter model that is what we often, you know, often architected as an MOE, meaning that not all of those parameters activate during inference time, but they're still very large. It's too expensive. Every request that you make to that model is not a couple of cents. And so you have to find a way to actually build models that you can actually run for customers.