Yann LeCun
👤 PersonAppearances Over Time
Podcast Appearances
which can be very simple, that turns this into a text that expresses this thought. So that, in my opinion, is the blueprint of future dialogue systems. They will think about their answer, plan their answer by optimization before turning it into text. And that is Turing-complete.
which can be very simple, that turns this into a text that expresses this thought. So that, in my opinion, is the blueprint of future dialogue systems. They will think about their answer, plan their answer by optimization before turning it into text. And that is Turing-complete.
The space of representations. It goes abstract representation. Abstract representation. So you have an abstract representation inside the system. You have a prompt. The prompt goes through an encoder, produces a representation, perhaps goes through a predictor that predicts a representation of the answer, of the proper answer.
The space of representations. It goes abstract representation. Abstract representation. So you have an abstract representation inside the system. You have a prompt. The prompt goes through an encoder, produces a representation, perhaps goes through a predictor that predicts a representation of the answer, of the proper answer.
The space of representations. It goes abstract representation. Abstract representation. So you have an abstract representation inside the system. You have a prompt. The prompt goes through an encoder, produces a representation, perhaps goes through a predictor that predicts a representation of the answer, of the proper answer.
But that representation may not be a good answer because there might be some complicated reasoning you need to do, right? So then you have another process that takes the representation of the answers and modifies it so as to minimize a cost function that measures to what extent the answer is a good answer for the question.
But that representation may not be a good answer because there might be some complicated reasoning you need to do, right? So then you have another process that takes the representation of the answers and modifies it so as to minimize a cost function that measures to what extent the answer is a good answer for the question.
But that representation may not be a good answer because there might be some complicated reasoning you need to do, right? So then you have another process that takes the representation of the answers and modifies it so as to minimize a cost function that measures to what extent the answer is a good answer for the question.
Now, we sort of ignore the fact for, I mean, the issue for a moment of how you train that system to measure whether an answer is a good answer for a question.
Now, we sort of ignore the fact for, I mean, the issue for a moment of how you train that system to measure whether an answer is a good answer for a question.
Now, we sort of ignore the fact for, I mean, the issue for a moment of how you train that system to measure whether an answer is a good answer for a question.
It's an optimization process. You can do this if the entire system is differentiable, that scalar output is the result of running through some neural net, running the answer, the representation of the answer through some neural net. Then by gradient descent, by back-propagating gradients, you can figure out how to modify the representation of the answer so as to minimize that.
It's an optimization process. You can do this if the entire system is differentiable, that scalar output is the result of running through some neural net, running the answer, the representation of the answer through some neural net. Then by gradient descent, by back-propagating gradients, you can figure out how to modify the representation of the answer so as to minimize that.
It's an optimization process. You can do this if the entire system is differentiable, that scalar output is the result of running through some neural net, running the answer, the representation of the answer through some neural net. Then by gradient descent, by back-propagating gradients, you can figure out how to modify the representation of the answer so as to minimize that.
So that's still a gradient-based... It's gradient-based inference. So now you have a representation of the answer in abstract space. Now you can turn it into text. And the cool thing about this is that the representation now can be optimized through gradient descent, but also is independent of the language in which you're going to express the answer.
So that's still a gradient-based... It's gradient-based inference. So now you have a representation of the answer in abstract space. Now you can turn it into text. And the cool thing about this is that the representation now can be optimized through gradient descent, but also is independent of the language in which you're going to express the answer.
So that's still a gradient-based... It's gradient-based inference. So now you have a representation of the answer in abstract space. Now you can turn it into text. And the cool thing about this is that the representation now can be optimized through gradient descent, but also is independent of the language in which you're going to express the answer.
sensory information right okay but this can can this do something like reasoning which is what we're talking about well not really in a only in a very simple way i mean basically you can think of those things that's doing the kind of optimization i was i was talking about except they optimize in the discrete space which is the space of possible sequences of of tokens
sensory information right okay but this can can this do something like reasoning which is what we're talking about well not really in a only in a very simple way i mean basically you can think of those things that's doing the kind of optimization i was i was talking about except they optimize in the discrete space which is the space of possible sequences of of tokens
sensory information right okay but this can can this do something like reasoning which is what we're talking about well not really in a only in a very simple way i mean basically you can think of those things that's doing the kind of optimization i was i was talking about except they optimize in the discrete space which is the space of possible sequences of of tokens