Yann LeCun
👤 PersonAppearances Over Time
Podcast Appearances
And they do this optimization in a horribly inefficient way, which is generate a lot of hypotheses and then select the best ones. And that's incredibly wasteful in terms of computation. Because you basically have to run your LLM for every possible generated sequence. And it's incredibly wasteful.
And they do this optimization in a horribly inefficient way, which is generate a lot of hypotheses and then select the best ones. And that's incredibly wasteful in terms of computation. Because you basically have to run your LLM for every possible generated sequence. And it's incredibly wasteful.
And they do this optimization in a horribly inefficient way, which is generate a lot of hypotheses and then select the best ones. And that's incredibly wasteful in terms of computation. Because you basically have to run your LLM for every possible generated sequence. And it's incredibly wasteful.
So it's much better to do an optimization in continuous space where you can do gradient descent as opposed to like generate tons of things and then select the best. You just iteratively refine your answer to go towards the best, right? That's much more efficient. But you can only do this in continuous spaces with differentiable functions.
So it's much better to do an optimization in continuous space where you can do gradient descent as opposed to like generate tons of things and then select the best. You just iteratively refine your answer to go towards the best, right? That's much more efficient. But you can only do this in continuous spaces with differentiable functions.
So it's much better to do an optimization in continuous space where you can do gradient descent as opposed to like generate tons of things and then select the best. You just iteratively refine your answer to go towards the best, right? That's much more efficient. But you can only do this in continuous spaces with differentiable functions.
Right. So then we're asking the question of, conceptually, how do you train an energy-based model? So an energy-based model is a function with a scalar output, just a number. You give it two inputs, x and y, and it tells you whether y is compatible with x or not. x you observe. Let's say it's a prompt, an image, a video, whatever.
Right. So then we're asking the question of, conceptually, how do you train an energy-based model? So an energy-based model is a function with a scalar output, just a number. You give it two inputs, x and y, and it tells you whether y is compatible with x or not. x you observe. Let's say it's a prompt, an image, a video, whatever.
Right. So then we're asking the question of, conceptually, how do you train an energy-based model? So an energy-based model is a function with a scalar output, just a number. You give it two inputs, x and y, and it tells you whether y is compatible with x or not. x you observe. Let's say it's a prompt, an image, a video, whatever.
And y is a proposal for an answer, a continuation of the video, you know, whatever. And it tells you whether y is compatible with x. And the way it tells you that y is compatible with x is that the output of that function would be zero if y is compatible with x. It would be a positive number, non-zero, if y is not compatible with x.
And y is a proposal for an answer, a continuation of the video, you know, whatever. And it tells you whether y is compatible with x. And the way it tells you that y is compatible with x is that the output of that function would be zero if y is compatible with x. It would be a positive number, non-zero, if y is not compatible with x.
And y is a proposal for an answer, a continuation of the video, you know, whatever. And it tells you whether y is compatible with x. And the way it tells you that y is compatible with x is that the output of that function would be zero if y is compatible with x. It would be a positive number, non-zero, if y is not compatible with x.
Okay, how do you train a system like this at a completely general level is you show it pairs of X and Ys that are compatible, a question and the corresponding answer, and you train the parameters of the big neural net inside to produce zero.
Okay, how do you train a system like this at a completely general level is you show it pairs of X and Ys that are compatible, a question and the corresponding answer, and you train the parameters of the big neural net inside to produce zero.
Okay, how do you train a system like this at a completely general level is you show it pairs of X and Ys that are compatible, a question and the corresponding answer, and you train the parameters of the big neural net inside to produce zero.
Okay, now that doesn't completely work because the system might decide, well, I'm just going to say zero for everything. So now you have to have a process to make sure that for a wrong y, the energy would be larger than zero. And there you have two options. One is contrastive method.
Okay, now that doesn't completely work because the system might decide, well, I'm just going to say zero for everything. So now you have to have a process to make sure that for a wrong y, the energy would be larger than zero. And there you have two options. One is contrastive method.
Okay, now that doesn't completely work because the system might decide, well, I'm just going to say zero for everything. So now you have to have a process to make sure that for a wrong y, the energy would be larger than zero. And there you have two options. One is contrastive method.
So contrastive method is you show an x and a bad y, and you tell the system, well, that's, you know, give a high energy to this, like push up the energy, right? Change the weights in the neural net that computes the energy so that it goes up. So that's contrasting methods.
So contrastive method is you show an x and a bad y, and you tell the system, well, that's, you know, give a high energy to this, like push up the energy, right? Change the weights in the neural net that computes the energy so that it goes up. So that's contrasting methods.