Yann LeCun
๐ค SpeakerVoice Profile Active
This person's voice can be automatically recognized across podcast episodes using AI voice matching.
Appearances Over Time
Podcast Appearances
So it's much better to do an optimization in continuous space where you can do gradient descent as opposed to like generate tons of things and then select the best. You just iteratively refine your answer to go towards the best, right? That's much more efficient. But you can only do this in continuous spaces with differentiable functions.
So it's much better to do an optimization in continuous space where you can do gradient descent as opposed to like generate tons of things and then select the best. You just iteratively refine your answer to go towards the best, right? That's much more efficient. But you can only do this in continuous spaces with differentiable functions.
Right. So then we're asking the question of, conceptually, how do you train an energy-based model? So an energy-based model is a function with a scalar output, just a number. You give it two inputs, x and y, and it tells you whether y is compatible with x or not. x you observe. Let's say it's a prompt, an image, a video, whatever.
Right. So then we're asking the question of, conceptually, how do you train an energy-based model? So an energy-based model is a function with a scalar output, just a number. You give it two inputs, x and y, and it tells you whether y is compatible with x or not. x you observe. Let's say it's a prompt, an image, a video, whatever.
Right. So then we're asking the question of, conceptually, how do you train an energy-based model? So an energy-based model is a function with a scalar output, just a number. You give it two inputs, x and y, and it tells you whether y is compatible with x or not. x you observe. Let's say it's a prompt, an image, a video, whatever.
And y is a proposal for an answer, a continuation of the video, you know, whatever. And it tells you whether y is compatible with x. And the way it tells you that y is compatible with x is that the output of that function would be zero if y is compatible with x. It would be a positive number, non-zero, if y is not compatible with x.
And y is a proposal for an answer, a continuation of the video, you know, whatever. And it tells you whether y is compatible with x. And the way it tells you that y is compatible with x is that the output of that function would be zero if y is compatible with x. It would be a positive number, non-zero, if y is not compatible with x.
And y is a proposal for an answer, a continuation of the video, you know, whatever. And it tells you whether y is compatible with x. And the way it tells you that y is compatible with x is that the output of that function would be zero if y is compatible with x. It would be a positive number, non-zero, if y is not compatible with x.
Okay, how do you train a system like this at a completely general level is you show it pairs of X and Ys that are compatible, a question and the corresponding answer, and you train the parameters of the big neural net inside to produce zero.
Okay, how do you train a system like this at a completely general level is you show it pairs of X and Ys that are compatible, a question and the corresponding answer, and you train the parameters of the big neural net inside to produce zero.
Okay, how do you train a system like this at a completely general level is you show it pairs of X and Ys that are compatible, a question and the corresponding answer, and you train the parameters of the big neural net inside to produce zero.
Okay, now that doesn't completely work because the system might decide, well, I'm just going to say zero for everything. So now you have to have a process to make sure that for a wrong y, the energy would be larger than zero. And there you have two options. One is contrastive method.
Okay, now that doesn't completely work because the system might decide, well, I'm just going to say zero for everything. So now you have to have a process to make sure that for a wrong y, the energy would be larger than zero. And there you have two options. One is contrastive method.
Okay, now that doesn't completely work because the system might decide, well, I'm just going to say zero for everything. So now you have to have a process to make sure that for a wrong y, the energy would be larger than zero. And there you have two options. One is contrastive method.
So contrastive method is you show an x and a bad y, and you tell the system, well, that's, you know, give a high energy to this, like push up the energy, right? Change the weights in the neural net that computes the energy so that it goes up. So that's contrasting methods.
So contrastive method is you show an x and a bad y, and you tell the system, well, that's, you know, give a high energy to this, like push up the energy, right? Change the weights in the neural net that computes the energy so that it goes up. So that's contrasting methods.
So contrastive method is you show an x and a bad y, and you tell the system, well, that's, you know, give a high energy to this, like push up the energy, right? Change the weights in the neural net that computes the energy so that it goes up. So that's contrasting methods.
The problem with this is if the space of Y is large, the number of such contrasting samples you're going to have to show is gigantic. But people do this. They do this when you train a system with RLHF. Basically what you're training is what's called a reward model, which is basically an objective function that tells you whether an answer is good or bad. And that's basically exactly what
The problem with this is if the space of Y is large, the number of such contrasting samples you're going to have to show is gigantic. But people do this. They do this when you train a system with RLHF. Basically what you're training is what's called a reward model, which is basically an objective function that tells you whether an answer is good or bad. And that's basically exactly what
The problem with this is if the space of Y is large, the number of such contrasting samples you're going to have to show is gigantic. But people do this. They do this when you train a system with RLHF. Basically what you're training is what's called a reward model, which is basically an objective function that tells you whether an answer is good or bad. And that's basically exactly what