Yann LeCun
👤 PersonAppearances Over Time
Podcast Appearances
But then how you link this from the low level of actions, that needs to be done with things like JEPA that basically lifts the abstraction level of the representation without attempting to reconstruct every detail of the situation. That's why we need JEPAs for it.
But then how you link this from the low level of actions, that needs to be done with things like JEPA that basically lifts the abstraction level of the representation without attempting to reconstruct every detail of the situation. That's why we need JEPAs for it.
No, there's one thing that autoregressive LLMs, or that LLMs in general, not just the autoregressive one, but including the BERT-style bidirectional ones, are exploiting, and it's self-supervised learning. And I've been a very, very strong advocate of self-supervised learning for many years. So those things are an incredibly impressive demonstration that self-supervised learning actually works.
No, there's one thing that autoregressive LLMs, or that LLMs in general, not just the autoregressive one, but including the BERT-style bidirectional ones, are exploiting, and it's self-supervised learning. And I've been a very, very strong advocate of self-supervised learning for many years. So those things are an incredibly impressive demonstration that self-supervised learning actually works.
No, there's one thing that autoregressive LLMs, or that LLMs in general, not just the autoregressive one, but including the BERT-style bidirectional ones, are exploiting, and it's self-supervised learning. And I've been a very, very strong advocate of self-supervised learning for many years. So those things are an incredibly impressive demonstration that self-supervised learning actually works.
It didn't start with Bert, but it was really a good demonstration with this. The idea that you take a piece of text, you corrupt it, and then you train some gigantic neural net to reconstruct the parts that are missing, that has produced an enormous amount of benefits. It allowed us to create systems that understand language, systems that can translate hundreds of languages in any direction.
It didn't start with Bert, but it was really a good demonstration with this. The idea that you take a piece of text, you corrupt it, and then you train some gigantic neural net to reconstruct the parts that are missing, that has produced an enormous amount of benefits. It allowed us to create systems that understand language, systems that can translate hundreds of languages in any direction.
It didn't start with Bert, but it was really a good demonstration with this. The idea that you take a piece of text, you corrupt it, and then you train some gigantic neural net to reconstruct the parts that are missing, that has produced an enormous amount of benefits. It allowed us to create systems that understand language, systems that can translate hundreds of languages in any direction.
systems that are multilingual, so it's a single system that can be trained to understand hundreds of languages and translate in any direction, and produce summaries, and then answer questions and produce text.
systems that are multilingual, so it's a single system that can be trained to understand hundreds of languages and translate in any direction, and produce summaries, and then answer questions and produce text.
systems that are multilingual, so it's a single system that can be trained to understand hundreds of languages and translate in any direction, and produce summaries, and then answer questions and produce text.
And then there's a special case of it, which is the autoregressive trick, where you constrain the system to not elaborate a representation of the text from looking at the entire text, but only predicting a word from the words that come before. And you do this by constraining the architecture of the network. And that's what you can build an autoregressive LLM from.
And then there's a special case of it, which is the autoregressive trick, where you constrain the system to not elaborate a representation of the text from looking at the entire text, but only predicting a word from the words that come before. And you do this by constraining the architecture of the network. And that's what you can build an autoregressive LLM from.
And then there's a special case of it, which is the autoregressive trick, where you constrain the system to not elaborate a representation of the text from looking at the entire text, but only predicting a word from the words that come before. And you do this by constraining the architecture of the network. And that's what you can build an autoregressive LLM from.
So there was a surprise many years ago with what's called decoder-only LLMs. So, you know, systems of this type that are just trying to produce words from the previous one. And the fact that when you scale them up, they tend to really kind of understand more about language. When you train them on lots of data, you make them really big. That was kind of a surprise.
So there was a surprise many years ago with what's called decoder-only LLMs. So, you know, systems of this type that are just trying to produce words from the previous one. And the fact that when you scale them up, they tend to really kind of understand more about language. When you train them on lots of data, you make them really big. That was kind of a surprise.
So there was a surprise many years ago with what's called decoder-only LLMs. So, you know, systems of this type that are just trying to produce words from the previous one. And the fact that when you scale them up, they tend to really kind of understand more about language. When you train them on lots of data, you make them really big. That was kind of a surprise.
And that surprise occurred quite a while back, like, you know, with work from Google Meta, OpenAI, et cetera, going back to the GPT kind of work, general pre-trained transformers.
And that surprise occurred quite a while back, like, you know, with work from Google Meta, OpenAI, et cetera, going back to the GPT kind of work, general pre-trained transformers.
And that surprise occurred quite a while back, like, you know, with work from Google Meta, OpenAI, et cetera, going back to the GPT kind of work, general pre-trained transformers.