Yann LeCun
👤 PersonAppearances Over Time
Podcast Appearances
You only train the part of the network that is fed with the corrupted input. The other network you don't train, but since they share the same weight, when you modify the first one, it also modifies the second one. And with various tricks, you can prevent the system from collapsing, with the collapse of the type I was explaining before, where the system basically ignores the input.
You only train the part of the network that is fed with the corrupted input. The other network you don't train, but since they share the same weight, when you modify the first one, it also modifies the second one. And with various tricks, you can prevent the system from collapsing, with the collapse of the type I was explaining before, where the system basically ignores the input.
You only train the part of the network that is fed with the corrupted input. The other network you don't train, but since they share the same weight, when you modify the first one, it also modifies the second one. And with various tricks, you can prevent the system from collapsing, with the collapse of the type I was explaining before, where the system basically ignores the input.
So that works very well. The two techniques we've developed at FAIR, Deno and IGEPA, work really well for that.
So that works very well. The two techniques we've developed at FAIR, Deno and IGEPA, work really well for that.
So that works very well. The two techniques we've developed at FAIR, Deno and IGEPA, work really well for that.
So there's several scenarios. One scenario is you take an image, you corrupt it by changing the cropping, for example, changing the size a little bit, maybe changing the orientation, blurring it, changing the colors. doing all kinds of horrible things to it.
So there's several scenarios. One scenario is you take an image, you corrupt it by changing the cropping, for example, changing the size a little bit, maybe changing the orientation, blurring it, changing the colors. doing all kinds of horrible things to it.
So there's several scenarios. One scenario is you take an image, you corrupt it by changing the cropping, for example, changing the size a little bit, maybe changing the orientation, blurring it, changing the colors. doing all kinds of horrible things to it.
Basic horrible things that sort of degrade the quality a little bit and change the framing, you know, crop the image. And in some cases, in the case of iJet, you don't need to do any of this. You just mask some parts of it, right? You just basically remove some regions, like a big block, essentially. Yeah.
Basic horrible things that sort of degrade the quality a little bit and change the framing, you know, crop the image. And in some cases, in the case of iJet, you don't need to do any of this. You just mask some parts of it, right? You just basically remove some regions, like a big block, essentially. Yeah.
Basic horrible things that sort of degrade the quality a little bit and change the framing, you know, crop the image. And in some cases, in the case of iJet, you don't need to do any of this. You just mask some parts of it, right? You just basically remove some regions, like a big block, essentially. Yeah.
And then run through the encoders and train the entire system, encoder and predictor, to predict the representation of the good one from the representation of the corrupted one. So that's the IGEPA. It doesn't need to know that it's an image, for example, because the only thing it needs to know is how to do this masking.
And then run through the encoders and train the entire system, encoder and predictor, to predict the representation of the good one from the representation of the corrupted one. So that's the IGEPA. It doesn't need to know that it's an image, for example, because the only thing it needs to know is how to do this masking.
And then run through the encoders and train the entire system, encoder and predictor, to predict the representation of the good one from the representation of the corrupted one. So that's the IGEPA. It doesn't need to know that it's an image, for example, because the only thing it needs to know is how to do this masking.
Whereas with Deno, you need to know it's an image because you need to do things like geometry transformation and blurring and things like that that are really image-specific. A more recent version of this that we have is called VJPA. So it's basically the same idea as iJPA, except it's applied to video. So now you take a whole video and you mask a whole chunk of it.
Whereas with Deno, you need to know it's an image because you need to do things like geometry transformation and blurring and things like that that are really image-specific. A more recent version of this that we have is called VJPA. So it's basically the same idea as iJPA, except it's applied to video. So now you take a whole video and you mask a whole chunk of it.
Whereas with Deno, you need to know it's an image because you need to do things like geometry transformation and blurring and things like that that are really image-specific. A more recent version of this that we have is called VJPA. So it's basically the same idea as iJPA, except it's applied to video. So now you take a whole video and you mask a whole chunk of it.
And what we mask is actually kind of a temporal tube. So like a whole segment of each frame in the video over the entire video. Mm-hmm.
And what we mask is actually kind of a temporal tube. So like a whole segment of each frame in the video over the entire video. Mm-hmm.