Yann LeCun
👤 PersonAppearances Over Time
Podcast Appearances
And that not only is a lot simpler, but also it allows the system to essentially learn an abstract representation of the world where what can be modeled and predicted is preserved, and the rest is viewed as noise and eliminated by the encoder. So it kind of lifts the level of abstraction of the representation. If you think about this, this is something we do absolutely all the time.
Whenever we describe a phenomenon, we describe it at a particular level of abstraction. We don't always describe every natural phenomenon in terms of quantum field theory. That would be impossible.
Whenever we describe a phenomenon, we describe it at a particular level of abstraction. We don't always describe every natural phenomenon in terms of quantum field theory. That would be impossible.
Whenever we describe a phenomenon, we describe it at a particular level of abstraction. We don't always describe every natural phenomenon in terms of quantum field theory. That would be impossible.
We have multiple levels of abstraction to describe what happens in the world, starting from quantum field theory to atomic theory and molecules and chemistry, materials, and all the way up to concrete objects in the real world and things like that. We can't just only model everything at the lowest level. That's what the idea of JEPA is really about.
We have multiple levels of abstraction to describe what happens in the world, starting from quantum field theory to atomic theory and molecules and chemistry, materials, and all the way up to concrete objects in the real world and things like that. We can't just only model everything at the lowest level. That's what the idea of JEPA is really about.
We have multiple levels of abstraction to describe what happens in the world, starting from quantum field theory to atomic theory and molecules and chemistry, materials, and all the way up to concrete objects in the real world and things like that. We can't just only model everything at the lowest level. That's what the idea of JEPA is really about.
Learn abstract representation in a self-supervised manner. You can do it hierarchically as well. That, I think, is an essential component of an intelligent system. In language, we can get away without doing this because language is already, to some level, abstract, and already has eliminated a lot of information that is not predictable.
Learn abstract representation in a self-supervised manner. You can do it hierarchically as well. That, I think, is an essential component of an intelligent system. In language, we can get away without doing this because language is already, to some level, abstract, and already has eliminated a lot of information that is not predictable.
Learn abstract representation in a self-supervised manner. You can do it hierarchically as well. That, I think, is an essential component of an intelligent system. In language, we can get away without doing this because language is already, to some level, abstract, and already has eliminated a lot of information that is not predictable.
So we can get away without doing the chanter embedding, without lifting the abstraction level, and by directly predicting words.
So we can get away without doing the chanter embedding, without lifting the abstraction level, and by directly predicting words.
So we can get away without doing the chanter embedding, without lifting the abstraction level, and by directly predicting words.
Right. And the thing is, those self-supervised algorithms that learn by prediction, even in representation space, they learn more concepts if the input data you feed them is more redundant. The more redundancy there is in the data, the more they're able to capture some internal structure of it.
Right. And the thing is, those self-supervised algorithms that learn by prediction, even in representation space, they learn more concepts if the input data you feed them is more redundant. The more redundancy there is in the data, the more they're able to capture some internal structure of it.
Right. And the thing is, those self-supervised algorithms that learn by prediction, even in representation space, they learn more concepts if the input data you feed them is more redundant. The more redundancy there is in the data, the more they're able to capture some internal structure of it.
And so there, there is way more redundancy and structure in perceptual inputs, sensory input, like vision, than there is in text, which is not nearly as redundant. This is back to the question you were asking. a few minutes ago. Language might represent more information really because it's already compressed. You're right about that, but that means it's also less redundant.
And so there, there is way more redundancy and structure in perceptual inputs, sensory input, like vision, than there is in text, which is not nearly as redundant. This is back to the question you were asking. a few minutes ago. Language might represent more information really because it's already compressed. You're right about that, but that means it's also less redundant.
And so there, there is way more redundancy and structure in perceptual inputs, sensory input, like vision, than there is in text, which is not nearly as redundant. This is back to the question you were asking. a few minutes ago. Language might represent more information really because it's already compressed. You're right about that, but that means it's also less redundant.
And so self-supervised learning will not work as well.