Yann LeCun
๐ค SpeakerVoice Profile Active
This person's voice can be automatically recognized across podcast episodes using AI voice matching.
Appearances Over Time
Podcast Appearances
And then we started working on what we used to call unsupervised learning and sort of revived the idea of unsupervised learning in the early 2000s with Yoshua Bengio and Jeff Hinton. Then discovered that supervised learning actually works pretty well if you can collect enough data. And so the whole idea of unsupervised self-supervised learning kind of took a backseat for a bit.
And then we started working on what we used to call unsupervised learning and sort of revived the idea of unsupervised learning in the early 2000s with Yoshua Bengio and Jeff Hinton. Then discovered that supervised learning actually works pretty well if you can collect enough data. And so the whole idea of unsupervised self-supervised learning kind of took a backseat for a bit.
And then we started working on what we used to call unsupervised learning and sort of revived the idea of unsupervised learning in the early 2000s with Yoshua Bengio and Jeff Hinton. Then discovered that supervised learning actually works pretty well if you can collect enough data. And so the whole idea of unsupervised self-supervised learning kind of took a backseat for a bit.
And then I kind of tried to revive it in a big way, starting in 2014, basically when we started FAIR. and really pushing for finding new methods to do self-supervised learning, both for text and for images and for video and audio. And some of that work has been incredibly successful.
And then I kind of tried to revive it in a big way, starting in 2014, basically when we started FAIR. and really pushing for finding new methods to do self-supervised learning, both for text and for images and for video and audio. And some of that work has been incredibly successful.
And then I kind of tried to revive it in a big way, starting in 2014, basically when we started FAIR. and really pushing for finding new methods to do self-supervised learning, both for text and for images and for video and audio. And some of that work has been incredibly successful.
I mean, the reason why we have a multilingual translation system, you know, things to do content moderation on Meta, for example, on Facebook, that are multilingual, that understand whether a piece of text is hate speech or not or something. is due to that progress using self-supervised learning for NLP, combining this with transformer architectures and blah, blah, blah.
I mean, the reason why we have a multilingual translation system, you know, things to do content moderation on Meta, for example, on Facebook, that are multilingual, that understand whether a piece of text is hate speech or not or something. is due to that progress using self-supervised learning for NLP, combining this with transformer architectures and blah, blah, blah.
I mean, the reason why we have a multilingual translation system, you know, things to do content moderation on Meta, for example, on Facebook, that are multilingual, that understand whether a piece of text is hate speech or not or something. is due to that progress using self-supervised learning for NLP, combining this with transformer architectures and blah, blah, blah.
But that's the big success of self-supervised learning. We had similar success in speech recognition, a system called Wave2Vec, which is also a joint embedding architecture, by the way, trained with contrastive learning. And that system also can produce speech recognition systems that are multilingual,
But that's the big success of self-supervised learning. We had similar success in speech recognition, a system called Wave2Vec, which is also a joint embedding architecture, by the way, trained with contrastive learning. And that system also can produce speech recognition systems that are multilingual,
But that's the big success of self-supervised learning. We had similar success in speech recognition, a system called Wave2Vec, which is also a joint embedding architecture, by the way, trained with contrastive learning. And that system also can produce speech recognition systems that are multilingual,
with mostly unlabeled data and only need a few minutes of labeled data to actually do speech recognition. That's amazing. We have systems now based on those combination of ideas that can do real-time translation of hundreds of languages into each other.
with mostly unlabeled data and only need a few minutes of labeled data to actually do speech recognition. That's amazing. We have systems now based on those combination of ideas that can do real-time translation of hundreds of languages into each other.
with mostly unlabeled data and only need a few minutes of labeled data to actually do speech recognition. That's amazing. We have systems now based on those combination of ideas that can do real-time translation of hundreds of languages into each other.
That's right. We don't go through text. It goes directly from speech-to-speech using an internal representation of kind of speech units that are discrete. But it's called textless NLP. We used to call it this way. But yeah, so that, I mean, incredible success there. And then, you know, for 10 years, we tried to apply this idea
That's right. We don't go through text. It goes directly from speech-to-speech using an internal representation of kind of speech units that are discrete. But it's called textless NLP. We used to call it this way. But yeah, so that, I mean, incredible success there. And then, you know, for 10 years, we tried to apply this idea
That's right. We don't go through text. It goes directly from speech-to-speech using an internal representation of kind of speech units that are discrete. But it's called textless NLP. We used to call it this way. But yeah, so that, I mean, incredible success there. And then, you know, for 10 years, we tried to apply this idea
to learning representations of images by training a system to predict videos, learning intuitive physics by training a system to predict what's going to happen in a video, and tried and tried and failed and failed with generative models, with models that predict pixels. We could not get them to learn good representations of images. We could not get them to learn good representations of videos.
to learning representations of images by training a system to predict videos, learning intuitive physics by training a system to predict what's going to happen in a video, and tried and tried and failed and failed with generative models, with models that predict pixels. We could not get them to learn good representations of images. We could not get them to learn good representations of videos.