Yann LeCun
👤 PersonAppearances Over Time
Podcast Appearances
That's right. We don't go through text. It goes directly from speech-to-speech using an internal representation of kind of speech units that are discrete. But it's called textless NLP. We used to call it this way. But yeah, so that, I mean, incredible success there. And then, you know, for 10 years, we tried to apply this idea
That's right. We don't go through text. It goes directly from speech-to-speech using an internal representation of kind of speech units that are discrete. But it's called textless NLP. We used to call it this way. But yeah, so that, I mean, incredible success there. And then, you know, for 10 years, we tried to apply this idea
to learning representations of images by training a system to predict videos, learning intuitive physics by training a system to predict what's going to happen in a video, and tried and tried and failed and failed with generative models, with models that predict pixels. We could not get them to learn good representations of images. We could not get them to learn good representations of videos.
to learning representations of images by training a system to predict videos, learning intuitive physics by training a system to predict what's going to happen in a video, and tried and tried and failed and failed with generative models, with models that predict pixels. We could not get them to learn good representations of images. We could not get them to learn good representations of videos.
to learning representations of images by training a system to predict videos, learning intuitive physics by training a system to predict what's going to happen in a video, and tried and tried and failed and failed with generative models, with models that predict pixels. We could not get them to learn good representations of images. We could not get them to learn good representations of videos.
And we tried many times. We published lots of papers on it. They kind of sort of worked, but not really great. It started working. We abandoned this idea of predicting every pixel and basically just doing digital embedding and predicting in representation space. That works.
And we tried many times. We published lots of papers on it. They kind of sort of worked, but not really great. It started working. We abandoned this idea of predicting every pixel and basically just doing digital embedding and predicting in representation space. That works.
And we tried many times. We published lots of papers on it. They kind of sort of worked, but not really great. It started working. We abandoned this idea of predicting every pixel and basically just doing digital embedding and predicting in representation space. That works.
So there's ample evidence that we're not going to be able to learn good representations of the real world using generative model. So I'm telling people, everybody's talking about generative AI. If you're really interested in human-level AI, abandon the idea of generative AI.
So there's ample evidence that we're not going to be able to learn good representations of the real world using generative model. So I'm telling people, everybody's talking about generative AI. If you're really interested in human-level AI, abandon the idea of generative AI.
So there's ample evidence that we're not going to be able to learn good representations of the real world using generative model. So I'm telling people, everybody's talking about generative AI. If you're really interested in human-level AI, abandon the idea of generative AI.
Right. Well, there's a lot of situations that might be difficult for a purely language-based system to know. Like, okay, you can probably learn from reading texts, the entirety of the publicly available texts in the world, that I cannot get from New York to Paris by snapping my fingers. That's not going to work, right? Yes.
Right. Well, there's a lot of situations that might be difficult for a purely language-based system to know. Like, okay, you can probably learn from reading texts, the entirety of the publicly available texts in the world, that I cannot get from New York to Paris by snapping my fingers. That's not going to work, right? Yes.
Right. Well, there's a lot of situations that might be difficult for a purely language-based system to know. Like, okay, you can probably learn from reading texts, the entirety of the publicly available texts in the world, that I cannot get from New York to Paris by snapping my fingers. That's not going to work, right? Yes.
But there's probably more complex scenarios of this type, which an NLM may never have encountered and may not be able to determine whether it's possible or not. So that link from the low level to the high level. The thing is that the high level that language expresses is based on a common experience of the low level, which LLMs currently do not have.
But there's probably more complex scenarios of this type, which an NLM may never have encountered and may not be able to determine whether it's possible or not. So that link from the low level to the high level. The thing is that the high level that language expresses is based on a common experience of the low level, which LLMs currently do not have.
But there's probably more complex scenarios of this type, which an NLM may never have encountered and may not be able to determine whether it's possible or not. So that link from the low level to the high level. The thing is that the high level that language expresses is based on a common experience of the low level, which LLMs currently do not have.
When we talk to each other, we know we have a common experience of the world. A lot of it is similar.
When we talk to each other, we know we have a common experience of the world. A lot of it is similar.
When we talk to each other, we know we have a common experience of the world. A lot of it is similar.