Yann LeCun
👤 PersonAppearances Over Time
Podcast Appearances
So all the things that you do instinctively without really having to deliberately plan and think about it. And then there is all the tasks where you need to plan. So if you are... and not to an experienced chess player, or you are experienced when you play against another experienced chess player. You think about all kinds of options, right? You think about it for a while, right?
You're much better if you have time to think about it than you are if you play Blitz with limited time. So this type of deliberate planning, which uses your internal world model, that's system two. This is what LLMs currently cannot do. So how do we get them to do this? How do we build a system that can do this kind of planning or reasoning that devotes
You're much better if you have time to think about it than you are if you play Blitz with limited time. So this type of deliberate planning, which uses your internal world model, that's system two. This is what LLMs currently cannot do. So how do we get them to do this? How do we build a system that can do this kind of planning or reasoning that devotes
You're much better if you have time to think about it than you are if you play Blitz with limited time. So this type of deliberate planning, which uses your internal world model, that's system two. This is what LLMs currently cannot do. So how do we get them to do this? How do we build a system that can do this kind of planning or reasoning that devotes
more resources to complex problems than to simple problems. And it's not going to be autoregressive prediction of tokens. It's going to be more something akin to inference of latent variables in what used to be called probabilistic models or graphical models and things of that type. So basically, the principle is like this. The prompt is like observed variables.
more resources to complex problems than to simple problems. And it's not going to be autoregressive prediction of tokens. It's going to be more something akin to inference of latent variables in what used to be called probabilistic models or graphical models and things of that type. So basically, the principle is like this. The prompt is like observed variables.
more resources to complex problems than to simple problems. And it's not going to be autoregressive prediction of tokens. It's going to be more something akin to inference of latent variables in what used to be called probabilistic models or graphical models and things of that type. So basically, the principle is like this. The prompt is like observed variables.
And what the model does is that it can measure to what extent an answer is a good answer for a prompt. So think of it as some gigantic neural net, but it's got only one output. And that output is a scalar number, which is, let's say, zero if the answer is a good answer for the question, and a large number if the answer is not a good answer for the question. Imagine you had this model.
And what the model does is that it can measure to what extent an answer is a good answer for a prompt. So think of it as some gigantic neural net, but it's got only one output. And that output is a scalar number, which is, let's say, zero if the answer is a good answer for the question, and a large number if the answer is not a good answer for the question. Imagine you had this model.
And what the model does is that it can measure to what extent an answer is a good answer for a prompt. So think of it as some gigantic neural net, but it's got only one output. And that output is a scalar number, which is, let's say, zero if the answer is a good answer for the question, and a large number if the answer is not a good answer for the question. Imagine you had this model.
If you had such a model, you could use it to produce good answers. The way you would do is produce the prompt and then search through the space of possible answers for one that minimizes that number. That's called an energy-based model.
If you had such a model, you could use it to produce good answers. The way you would do is produce the prompt and then search through the space of possible answers for one that minimizes that number. That's called an energy-based model.
If you had such a model, you could use it to produce good answers. The way you would do is produce the prompt and then search through the space of possible answers for one that minimizes that number. That's called an energy-based model.
Well, so really what you need to do would be to not search over possible strings of text that minimize that energy. But what you would do is do this in abstract representation space. So in sort of the space of abstract thoughts, you would elaborate a thought, right, using this process of minimizing the output of your model, okay, which is just a scalar. It's an optimization process.
Well, so really what you need to do would be to not search over possible strings of text that minimize that energy. But what you would do is do this in abstract representation space. So in sort of the space of abstract thoughts, you would elaborate a thought, right, using this process of minimizing the output of your model, okay, which is just a scalar. It's an optimization process.
Well, so really what you need to do would be to not search over possible strings of text that minimize that energy. But what you would do is do this in abstract representation space. So in sort of the space of abstract thoughts, you would elaborate a thought, right, using this process of minimizing the output of your model, okay, which is just a scalar. It's an optimization process.
So now the way the system produces its answer is through optimization, by minimizing an objective function, basically. And we're talking about inference, we're not talking about training. The system has been trained already. So now we have an abstract representation of the thought of the answer, representation of the answer. We feed that to, basically, an autoregressive decoder,
So now the way the system produces its answer is through optimization, by minimizing an objective function, basically. And we're talking about inference, we're not talking about training. The system has been trained already. So now we have an abstract representation of the thought of the answer, representation of the answer. We feed that to, basically, an autoregressive decoder,
So now the way the system produces its answer is through optimization, by minimizing an objective function, basically. And we're talking about inference, we're not talking about training. The system has been trained already. So now we have an abstract representation of the thought of the answer, representation of the answer. We feed that to, basically, an autoregressive decoder,
which can be very simple, that turns this into a text that expresses this thought. So that, in my opinion, is the blueprint of future dialogue systems. They will think about their answer, plan their answer by optimization before turning it into text. And that is Turing-complete.