Eiso Kant
👤 PersonAppearances Over Time
Podcast Appearances
There's a lot more details behind this, but the way to think about it is if you can simulate it, you can actually build an extremely large data set. And part of the things that we synthetically generate is not just the output code, but it's the intermediate thinking and reasoning to get to that output code.
There's a lot more details behind this, but the way to think about it is if you can simulate it, you can actually build an extremely large data set. And part of the things that we synthetically generate is not just the output code, but it's the intermediate thinking and reasoning to get to that output code.
There's a lot more details behind this, but the way to think about it is if you can simulate it, you can actually build an extremely large data set. And part of the things that we synthetically generate is not just the output code, but it's the intermediate thinking and reasoning to get to that output code.
Because models today, and you can try this yourself by going online and chatting to any model, can actually produce their thinking. They're not very good at it yet. So what do you do when your thinking is not very good? You need feedback. In our case, deterministic feedback, code execution feedback.
Because models today, and you can try this yourself by going online and chatting to any model, can actually produce their thinking. They're not very good at it yet. So what do you do when your thinking is not very good? You need feedback. In our case, deterministic feedback, code execution feedback.
Because models today, and you can try this yourself by going online and chatting to any model, can actually produce their thinking. They're not very good at it yet. So what do you do when your thinking is not very good? You need feedback. In our case, deterministic feedback, code execution feedback.
We are making in our space, especially I think post the ChatGPT moment, like incredible advancements in the algorithms that are making learning more efficient. Internally, I have this thing that I say to the team, and they're probably tired of me hearing because I say it every single day.
We are making in our space, especially I think post the ChatGPT moment, like incredible advancements in the algorithms that are making learning more efficient. Internally, I have this thing that I say to the team, and they're probably tired of me hearing because I say it every single day.
We are making in our space, especially I think post the ChatGPT moment, like incredible advancements in the algorithms that are making learning more efficient. Internally, I have this thing that I say to the team, and they're probably tired of me hearing because I say it every single day.
I say, all the work we do on foundation models, on one hand, is improving their compute efficiency for training or running them, or on the other hand, improving data. Now, the way to think about the algorithms and the improvement of compute efficiencies, that's table stakes. All of us, OpenAI, Anthropy, Google, et cetera, are doing this, and we're just constantly improving here.
I say, all the work we do on foundation models, on one hand, is improving their compute efficiency for training or running them, or on the other hand, improving data. Now, the way to think about the algorithms and the improvement of compute efficiencies, that's table stakes. All of us, OpenAI, Anthropy, Google, et cetera, are doing this, and we're just constantly improving here.
I say, all the work we do on foundation models, on one hand, is improving their compute efficiency for training or running them, or on the other hand, improving data. Now, the way to think about the algorithms and the improvement of compute efficiencies, that's table stakes. All of us, OpenAI, Anthropy, Google, et cetera, are doing this, and we're just constantly improving here.
And it's engineering and research combined. But the real differentiation between two models is the data. compute matters tremendously for data. Because if you think about poolside, and we spoke about how do we get this data, and I mentioned the word synthetic, it means that we're generating it.
And it's engineering and research combined. But the real differentiation between two models is the data. compute matters tremendously for data. Because if you think about poolside, and we spoke about how do we get this data, and I mentioned the word synthetic, it means that we're generating it.
And it's engineering and research combined. But the real differentiation between two models is the data. compute matters tremendously for data. Because if you think about poolside, and we spoke about how do we get this data, and I mentioned the word synthetic, it means that we're generating it.
It means that we're using models to generate data, to then actually use models to evaluate it, to then run it. And so compute usually matters on the side of the generation of data. But once we have all of this data, where we started today, we spoke about neural nets essentially being compression of data that forces and generalizes learning. Now, when we have small models,
It means that we're using models to generate data, to then actually use models to evaluate it, to then run it. And so compute usually matters on the side of the generation of data. But once we have all of this data, where we started today, we spoke about neural nets essentially being compression of data that forces and generalizes learning. Now, when we have small models,
It means that we're using models to generate data, to then actually use models to evaluate it, to then run it. And so compute usually matters on the side of the generation of data. But once we have all of this data, where we started today, we spoke about neural nets essentially being compression of data that forces and generalizes learning. Now, when we have small models,
We are taking huge amounts of data and we're forcing this generalization of learning to happen in a very small space. And this is why we essentially see these difference in capabilities. Larger models require, essentially, it's easier for them to generalize because we're not forcing so much data into such a small compression space.
We are taking huge amounts of data and we're forcing this generalization of learning to happen in a very small space. And this is why we essentially see these difference in capabilities. Larger models require, essentially, it's easier for them to generalize because we're not forcing so much data into such a small compression space.