Yann LeCun
👤 PersonAppearances Over Time
Podcast Appearances
And so what it's going to do is produce a probability distribution over all the possible words in your dictionary. In fact, it doesn't predict words, it predicts tokens that are kind of subword units. And so it's easy to handle the uncertainty in the prediction there, because there is only a finite number of possible words in the dictionary, and you can just compute a distribution over them.
And so what it's going to do is produce a probability distribution over all the possible words in your dictionary. In fact, it doesn't predict words, it predicts tokens that are kind of subword units. And so it's easy to handle the uncertainty in the prediction there, because there is only a finite number of possible words in the dictionary, and you can just compute a distribution over them.
And so what it's going to do is produce a probability distribution over all the possible words in your dictionary. In fact, it doesn't predict words, it predicts tokens that are kind of subword units. And so it's easy to handle the uncertainty in the prediction there, because there is only a finite number of possible words in the dictionary, and you can just compute a distribution over them.
Then what the system does is that it picks a word from that distribution. Of course, there's a higher chance of picking words that have a higher probability within that distribution. So you sample from the distribution to actually produce a word. And then you shift that word into the input. And so that allows the system not to predict the second word, right?
Then what the system does is that it picks a word from that distribution. Of course, there's a higher chance of picking words that have a higher probability within that distribution. So you sample from the distribution to actually produce a word. And then you shift that word into the input. And so that allows the system not to predict the second word, right?
Then what the system does is that it picks a word from that distribution. Of course, there's a higher chance of picking words that have a higher probability within that distribution. So you sample from the distribution to actually produce a word. And then you shift that word into the input. And so that allows the system not to predict the second word, right?
And once you do this, you shift it into the input, et cetera. That's called autoregressive prediction, which is why those LLMs should be called autoregressive LLMs. But we just call them LLMs. And there is a difference between this kind of process and a process by which before producing a word, when you talk, when you and I talk, You and I are bilingual.
And once you do this, you shift it into the input, et cetera. That's called autoregressive prediction, which is why those LLMs should be called autoregressive LLMs. But we just call them LLMs. And there is a difference between this kind of process and a process by which before producing a word, when you talk, when you and I talk, You and I are bilingual.
And once you do this, you shift it into the input, et cetera. That's called autoregressive prediction, which is why those LLMs should be called autoregressive LLMs. But we just call them LLMs. And there is a difference between this kind of process and a process by which before producing a word, when you talk, when you and I talk, You and I are bilingual.
We think about what we're going to say, and it's relatively independent of the language in which we're going to say it. When we talk about, I don't know, let's say a mathematical concept or something, the kind of thinking that we're doing and the answer that we're planning to produce is is not linked to whether we're going to see it in French or Russian or English.
We think about what we're going to say, and it's relatively independent of the language in which we're going to say it. When we talk about, I don't know, let's say a mathematical concept or something, the kind of thinking that we're doing and the answer that we're planning to produce is is not linked to whether we're going to see it in French or Russian or English.
We think about what we're going to say, and it's relatively independent of the language in which we're going to say it. When we talk about, I don't know, let's say a mathematical concept or something, the kind of thinking that we're doing and the answer that we're planning to produce is is not linked to whether we're going to see it in French or Russian or English.
Right. It's certainly true for a lot of thinking that we do.
Right. It's certainly true for a lot of thinking that we do.
Right. It's certainly true for a lot of thinking that we do.
Yeah, pretty much.
Yeah, pretty much.
Yeah, pretty much.
Well, it depends what kind of thinking, right? If it's just, if it's like producing puns, I get much better in French than English about that.
Well, it depends what kind of thinking, right? If it's just, if it's like producing puns, I get much better in French than English about that.