Yann LeCun
👤 PersonAppearances Over Time
Podcast Appearances
In the first few months of life, babies don't really have any influence on the world. They can only observe, right? And you accumulate a gigantic amount of knowledge just from that. So that's what we're missing from current AI systems.
Right. So because of the autoregressive prediction, every time an LLM produces a token or a word, there is some level of probability for that word to take you out of the set of reasonable answers. And if you assume, which is a very strong assumption, that the probability of such error is that those errors are independent across a sequence of tokens being produced.
Right. So because of the autoregressive prediction, every time an LLM produces a token or a word, there is some level of probability for that word to take you out of the set of reasonable answers. And if you assume, which is a very strong assumption, that the probability of such error is that those errors are independent across a sequence of tokens being produced.
Right. So because of the autoregressive prediction, every time an LLM produces a token or a word, there is some level of probability for that word to take you out of the set of reasonable answers. And if you assume, which is a very strong assumption, that the probability of such error is that those errors are independent across a sequence of tokens being produced.
What that means is that every time you produce a token, the probability that you stay within the set of correct answers decreases, and it decreases exponentially.
What that means is that every time you produce a token, the probability that you stay within the set of correct answers decreases, and it decreases exponentially.
What that means is that every time you produce a token, the probability that you stay within the set of correct answers decreases, and it decreases exponentially.
Yeah. And that drift is exponential. It's like errors accumulate, right? So the probability that an answer would be nonsensical increases exponentially with the number of tokens.
Yeah. And that drift is exponential. It's like errors accumulate, right? So the probability that an answer would be nonsensical increases exponentially with the number of tokens.
Yeah. And that drift is exponential. It's like errors accumulate, right? So the probability that an answer would be nonsensical increases exponentially with the number of tokens.
No, it's basically a struggle against the curse of dimensionality. So the way you can correct for this is that you fine-tune the system by having it produce answers for all kinds of questions that people might come up with. And people are people, so a lot of the questions that they have are very similar to each other, so you can probably cover 80% or whatever.
No, it's basically a struggle against the curse of dimensionality. So the way you can correct for this is that you fine-tune the system by having it produce answers for all kinds of questions that people might come up with. And people are people, so a lot of the questions that they have are very similar to each other, so you can probably cover 80% or whatever.
No, it's basically a struggle against the curse of dimensionality. So the way you can correct for this is that you fine-tune the system by having it produce answers for all kinds of questions that people might come up with. And people are people, so a lot of the questions that they have are very similar to each other, so you can probably cover 80% or whatever.
of questions that people will ask by collecting data. And then you fine-tune the system to produce good answers for all of those things. And it's probably going to be able to learn that because it's got a lot of capacity to learn. But then there is... you know, the enormous set of prompts that you have not covered during training. And that set is enormous.
of questions that people will ask by collecting data. And then you fine-tune the system to produce good answers for all of those things. And it's probably going to be able to learn that because it's got a lot of capacity to learn. But then there is... you know, the enormous set of prompts that you have not covered during training. And that set is enormous.
of questions that people will ask by collecting data. And then you fine-tune the system to produce good answers for all of those things. And it's probably going to be able to learn that because it's got a lot of capacity to learn. But then there is... you know, the enormous set of prompts that you have not covered during training. And that set is enormous.
Like within the set of all possible prompts, the proportion of prompts that have been used for training is absolutely tiny. It's a tiny, tiny, tiny subset of all possible prompts. And so the system will behave properly on the prompts that has been either trained, pre-trained or fine-tuned.
Like within the set of all possible prompts, the proportion of prompts that have been used for training is absolutely tiny. It's a tiny, tiny, tiny subset of all possible prompts. And so the system will behave properly on the prompts that has been either trained, pre-trained or fine-tuned.
Like within the set of all possible prompts, the proportion of prompts that have been used for training is absolutely tiny. It's a tiny, tiny, tiny subset of all possible prompts. And so the system will behave properly on the prompts that has been either trained, pre-trained or fine-tuned.
But then there is an entire space of things that it cannot possibly have been trained on because the number is gigantic. So whatever training the system has been subject to to produce appropriate answers, you can break it by finding out a prompt that will be outside of the set of prompts it's been trained on, or things that are similar, and then it will just spew complete nonsense.