Yann LeCun
👤 PersonAppearances Over Time
Podcast Appearances
And then you train a predictor on top of those encoders to predict the representation of the full input from the representation of the corrupted one. So joint embedding, because you're taking the full input and the corrupted version, or transformed version, run them both through encoders, so you get a joint embedding.
And then you train a predictor on top of those encoders to predict the representation of the full input from the representation of the corrupted one. So joint embedding, because you're taking the full input and the corrupted version, or transformed version, run them both through encoders, so you get a joint embedding.
And then you're saying, can I predict the representation of the full one from the representation of the corrupted one? And I call this a JEPA, so that means joint embedding predictive architecture, because there's joint embedding and there is this predictor that predicts the representation of the good guy from the bad guy. And the big question is, how do you train something like this?
And then you're saying, can I predict the representation of the full one from the representation of the corrupted one? And I call this a JEPA, so that means joint embedding predictive architecture, because there's joint embedding and there is this predictor that predicts the representation of the good guy from the bad guy. And the big question is, how do you train something like this?
And then you're saying, can I predict the representation of the full one from the representation of the corrupted one? And I call this a JEPA, so that means joint embedding predictive architecture, because there's joint embedding and there is this predictor that predicts the representation of the good guy from the bad guy. And the big question is, how do you train something like this?
And until five years ago, six years ago, we didn't have particularly good answers for how you train those things, except for one called contrastive learning.
And until five years ago, six years ago, we didn't have particularly good answers for how you train those things, except for one called contrastive learning.
And until five years ago, six years ago, we didn't have particularly good answers for how you train those things, except for one called contrastive learning.
And the idea of contractive learning is you take a pair of images that are, again, an image and a corrupted version or degraded version somehow, or transformed version of the original one, and you train the predicted representation to be the same as that. If you only do this, the system collapses. It basically completely ignores the input and produces representations that are constant.
And the idea of contractive learning is you take a pair of images that are, again, an image and a corrupted version or degraded version somehow, or transformed version of the original one, and you train the predicted representation to be the same as that. If you only do this, the system collapses. It basically completely ignores the input and produces representations that are constant.
And the idea of contractive learning is you take a pair of images that are, again, an image and a corrupted version or degraded version somehow, or transformed version of the original one, and you train the predicted representation to be the same as that. If you only do this, the system collapses. It basically completely ignores the input and produces representations that are constant.
So the contrastive methods avoid this. And those things have been around since the early 90s. I had a paper on this in 1993. You also show pairs of images that you know are different. And then you push away the representations from each other.
So the contrastive methods avoid this. And those things have been around since the early 90s. I had a paper on this in 1993. You also show pairs of images that you know are different. And then you push away the representations from each other.
So the contrastive methods avoid this. And those things have been around since the early 90s. I had a paper on this in 1993. You also show pairs of images that you know are different. And then you push away the representations from each other.
So you say, not only do representations of things that we know are the same, should be the same or should be similar, but representation of things that we know are different should be different. And that prevents the collapse, but it has some limitation. And there's a whole bunch of techniques that have appeared over the last six, seven years that can revive this type of method.
So you say, not only do representations of things that we know are the same, should be the same or should be similar, but representation of things that we know are different should be different. And that prevents the collapse, but it has some limitation. And there's a whole bunch of techniques that have appeared over the last six, seven years that can revive this type of method.
So you say, not only do representations of things that we know are the same, should be the same or should be similar, but representation of things that we know are different should be different. And that prevents the collapse, but it has some limitation. And there's a whole bunch of techniques that have appeared over the last six, seven years that can revive this type of method.
Some of them from FAIR, some of them from Google and other places. But there are limitations to those contrasting methods. What has changed in the last... you know, three, four years, is now we have methods that are non-contrastive. So they don't require those negative contrastive samples of images that we know are different.
Some of them from FAIR, some of them from Google and other places. But there are limitations to those contrasting methods. What has changed in the last... you know, three, four years, is now we have methods that are non-contrastive. So they don't require those negative contrastive samples of images that we know are different.
Some of them from FAIR, some of them from Google and other places. But there are limitations to those contrasting methods. What has changed in the last... you know, three, four years, is now we have methods that are non-contrastive. So they don't require those negative contrastive samples of images that we know are different.