Yann LeCun
👤 PersonAppearances Over Time
Podcast Appearances
Okay, but then that assumes that those systems actually possess an eternal world model.
Okay, but then that assumes that those systems actually possess an eternal world model.
Yeah. So can you build this, first of all, by prediction? Right. And the answer is probably yes. Can you build it by predicting words? And the answer is most probably no, because language is very poor in terms of weak or low bandwidth, if you want. There's just not enough information there.
Yeah. So can you build this, first of all, by prediction? Right. And the answer is probably yes. Can you build it by predicting words? And the answer is most probably no, because language is very poor in terms of weak or low bandwidth, if you want. There's just not enough information there.
Yeah. So can you build this, first of all, by prediction? Right. And the answer is probably yes. Can you build it by predicting words? And the answer is most probably no, because language is very poor in terms of weak or low bandwidth, if you want. There's just not enough information there.
So building world models means observing the world and understanding why the world is evolving the way it is. And then the extra component of a world model is something that can predict how the world is going to evolve as a consequence of an action you might take, right? So one model really is, here is my idea of the state of the world at time t. Here is an action I might take.
So building world models means observing the world and understanding why the world is evolving the way it is. And then the extra component of a world model is something that can predict how the world is going to evolve as a consequence of an action you might take, right? So one model really is, here is my idea of the state of the world at time t. Here is an action I might take.
So building world models means observing the world and understanding why the world is evolving the way it is. And then the extra component of a world model is something that can predict how the world is going to evolve as a consequence of an action you might take, right? So one model really is, here is my idea of the state of the world at time t. Here is an action I might take.
What is the predicted state of the world at time t plus 1? Now that state of the world does not need to represent everything about the world. It just needs to represent enough that's relevant for this planning of the action, but not necessarily all the details. Now here is the problem. You're not going to be able to do this with generative models.
What is the predicted state of the world at time t plus 1? Now that state of the world does not need to represent everything about the world. It just needs to represent enough that's relevant for this planning of the action, but not necessarily all the details. Now here is the problem. You're not going to be able to do this with generative models.
What is the predicted state of the world at time t plus 1? Now that state of the world does not need to represent everything about the world. It just needs to represent enough that's relevant for this planning of the action, but not necessarily all the details. Now here is the problem. You're not going to be able to do this with generative models.
So a generative model that's trained on video, and we've tried to do this for 10 years. You take a video, show a system a piece of video, and then ask it to predict the reminder of the video. Basically, predict what's going to happen.
So a generative model that's trained on video, and we've tried to do this for 10 years. You take a video, show a system a piece of video, and then ask it to predict the reminder of the video. Basically, predict what's going to happen.
So a generative model that's trained on video, and we've tried to do this for 10 years. You take a video, show a system a piece of video, and then ask it to predict the reminder of the video. Basically, predict what's going to happen.
Either one frame at a time or a group of frames at a time. But yeah, a large video model, if you want. The idea of doing this has been floating around for a long time, and at FAIR, some of my colleagues and I have been trying to do this for about 10 years.
Either one frame at a time or a group of frames at a time. But yeah, a large video model, if you want. The idea of doing this has been floating around for a long time, and at FAIR, some of my colleagues and I have been trying to do this for about 10 years.
Either one frame at a time or a group of frames at a time. But yeah, a large video model, if you want. The idea of doing this has been floating around for a long time, and at FAIR, some of my colleagues and I have been trying to do this for about 10 years.
And you can't really do the same trick as with LLMs, because LLMs, as I said, you can't predict exactly which word is going to follow a sequence of words, but you can predict the distribution of words. Now, if you go to video, what you would have to do is predict the distribution over all possible frames in a video. And we don't really know how to do that properly.
And you can't really do the same trick as with LLMs, because LLMs, as I said, you can't predict exactly which word is going to follow a sequence of words, but you can predict the distribution of words. Now, if you go to video, what you would have to do is predict the distribution over all possible frames in a video. And we don't really know how to do that properly.
And you can't really do the same trick as with LLMs, because LLMs, as I said, you can't predict exactly which word is going to follow a sequence of words, but you can predict the distribution of words. Now, if you go to video, what you would have to do is predict the distribution over all possible frames in a video. And we don't really know how to do that properly.