Yann LeCun
๐ค SpeakerVoice Profile Active
This person's voice can be automatically recognized across podcast episodes using AI voice matching.
Appearances Over Time
Podcast Appearances
And then you're saying, can I predict the representation of the full one from the representation of the corrupted one? And I call this a JEPA, so that means joint embedding predictive architecture, because there's joint embedding and there is this predictor that predicts the representation of the good guy from the bad guy. And the big question is, how do you train something like this?
And until five years ago, six years ago, we didn't have particularly good answers for how you train those things, except for one called contrastive learning.
And until five years ago, six years ago, we didn't have particularly good answers for how you train those things, except for one called contrastive learning.
And until five years ago, six years ago, we didn't have particularly good answers for how you train those things, except for one called contrastive learning.
And the idea of contractive learning is you take a pair of images that are, again, an image and a corrupted version or degraded version somehow, or transformed version of the original one, and you train the predicted representation to be the same as that. If you only do this, the system collapses. It basically completely ignores the input and produces representations that are constant.
And the idea of contractive learning is you take a pair of images that are, again, an image and a corrupted version or degraded version somehow, or transformed version of the original one, and you train the predicted representation to be the same as that. If you only do this, the system collapses. It basically completely ignores the input and produces representations that are constant.
And the idea of contractive learning is you take a pair of images that are, again, an image and a corrupted version or degraded version somehow, or transformed version of the original one, and you train the predicted representation to be the same as that. If you only do this, the system collapses. It basically completely ignores the input and produces representations that are constant.
So the contrastive methods avoid this. And those things have been around since the early 90s. I had a paper on this in 1993. You also show pairs of images that you know are different. And then you push away the representations from each other.
So the contrastive methods avoid this. And those things have been around since the early 90s. I had a paper on this in 1993. You also show pairs of images that you know are different. And then you push away the representations from each other.
So the contrastive methods avoid this. And those things have been around since the early 90s. I had a paper on this in 1993. You also show pairs of images that you know are different. And then you push away the representations from each other.
So you say, not only do representations of things that we know are the same, should be the same or should be similar, but representation of things that we know are different should be different. And that prevents the collapse, but it has some limitation. And there's a whole bunch of techniques that have appeared over the last six, seven years that can revive this type of method.
So you say, not only do representations of things that we know are the same, should be the same or should be similar, but representation of things that we know are different should be different. And that prevents the collapse, but it has some limitation. And there's a whole bunch of techniques that have appeared over the last six, seven years that can revive this type of method.
So you say, not only do representations of things that we know are the same, should be the same or should be similar, but representation of things that we know are different should be different. And that prevents the collapse, but it has some limitation. And there's a whole bunch of techniques that have appeared over the last six, seven years that can revive this type of method.
Some of them from FAIR, some of them from Google and other places. But there are limitations to those contrasting methods. What has changed in the last... you know, three, four years, is now we have methods that are non-contrastive. So they don't require those negative contrastive samples of images that we know are different.
Some of them from FAIR, some of them from Google and other places. But there are limitations to those contrasting methods. What has changed in the last... you know, three, four years, is now we have methods that are non-contrastive. So they don't require those negative contrastive samples of images that we know are different.
Some of them from FAIR, some of them from Google and other places. But there are limitations to those contrasting methods. What has changed in the last... you know, three, four years, is now we have methods that are non-contrastive. So they don't require those negative contrastive samples of images that we know are different.
You turn them on you with images that are, you know, different versions or different views of the same thing. And you rely on some other tweaks to prevent the system from collapsing. And we have half a dozen different methods for this now.
You turn them on you with images that are, you know, different versions or different views of the same thing. And you rely on some other tweaks to prevent the system from collapsing. And we have half a dozen different methods for this now.
You turn them on you with images that are, you know, different versions or different views of the same thing. And you rely on some other tweaks to prevent the system from collapsing. And we have half a dozen different methods for this now.
Well, so it's a first step. So first of all, what's the difference with generative architectures like LLMs? So LLMs or vision systems that are trained by reconstruction generate the inputs. They generate the original input that is non-corrupted, non-transformed. So you have to predict all the pixels.