Yann LeCun
👤 PersonAppearances Over Time
Podcast Appearances
But then there is an entire space of things that it cannot possibly have been trained on because the number is gigantic. So whatever training the system has been subject to to produce appropriate answers, you can break it by finding out a prompt that will be outside of the set of prompts it's been trained on, or things that are similar, and then it will just spew complete nonsense.
But then there is an entire space of things that it cannot possibly have been trained on because the number is gigantic. So whatever training the system has been subject to to produce appropriate answers, you can break it by finding out a prompt that will be outside of the set of prompts it's been trained on, or things that are similar, and then it will just spew complete nonsense.
I mean, people have come up with things where you put essentially a random sequence of characters in a prompt, and that's enough to kind of throw the system into a mode where it's going to answer something completely different than it would have answered without this. So that's a way to jailbreak the system, basically go outside of its conditioning, right?
I mean, people have come up with things where you put essentially a random sequence of characters in a prompt, and that's enough to kind of throw the system into a mode where it's going to answer something completely different than it would have answered without this. So that's a way to jailbreak the system, basically go outside of its conditioning, right?
I mean, people have come up with things where you put essentially a random sequence of characters in a prompt, and that's enough to kind of throw the system into a mode where it's going to answer something completely different than it would have answered without this. So that's a way to jailbreak the system, basically go outside of its conditioning, right?
Yeah, some people have done things like you write a sentence in English or you ask a question in English and it produces a perfectly fine answer. And then you just substitute a few words. by the same word in another language. And all of a sudden, the answer is complete nonsense.
Yeah, some people have done things like you write a sentence in English or you ask a question in English and it produces a perfectly fine answer. And then you just substitute a few words. by the same word in another language. And all of a sudden, the answer is complete nonsense.
Yeah, some people have done things like you write a sentence in English or you ask a question in English and it produces a perfectly fine answer. And then you just substitute a few words. by the same word in another language. And all of a sudden, the answer is complete nonsense.
So the problem is that there is a long tail. Yes. This is an issue that a lot of people have realized in social networks and stuff like that, which is there's a very, very long tail of things that people will ask. And you can fine-tune the system for the 80% or whatever of the things that most people will ask.
So the problem is that there is a long tail. Yes. This is an issue that a lot of people have realized in social networks and stuff like that, which is there's a very, very long tail of things that people will ask. And you can fine-tune the system for the 80% or whatever of the things that most people will ask.
So the problem is that there is a long tail. Yes. This is an issue that a lot of people have realized in social networks and stuff like that, which is there's a very, very long tail of things that people will ask. And you can fine-tune the system for the 80% or whatever of the things that most people will ask.
And then this long tail is so large that you're not going to be able to fine-tune the system for all the conditions. And in the end, the system ends up being kind of a giant lookup table, right, essentially, which is not really what you want. You want systems that can reason, certainly that can plan. So the type of reasoning that takes place in LLM is very, very primitive.
And then this long tail is so large that you're not going to be able to fine-tune the system for all the conditions. And in the end, the system ends up being kind of a giant lookup table, right, essentially, which is not really what you want. You want systems that can reason, certainly that can plan. So the type of reasoning that takes place in LLM is very, very primitive.
And then this long tail is so large that you're not going to be able to fine-tune the system for all the conditions. And in the end, the system ends up being kind of a giant lookup table, right, essentially, which is not really what you want. You want systems that can reason, certainly that can plan. So the type of reasoning that takes place in LLM is very, very primitive.
And the reason you can tell it's primitive is because the amount of computation that is spent per token produced is constant. So if you ask a question and that question has an answer in a given number of token, the amount of computation devoted to computing that answer can be exactly estimated.
And the reason you can tell it's primitive is because the amount of computation that is spent per token produced is constant. So if you ask a question and that question has an answer in a given number of token, the amount of computation devoted to computing that answer can be exactly estimated.
And the reason you can tell it's primitive is because the amount of computation that is spent per token produced is constant. So if you ask a question and that question has an answer in a given number of token, the amount of computation devoted to computing that answer can be exactly estimated.
It's like, you know, it's the size of the prediction network, you know, with its 36 layers or 92 layers or whatever it is, multiplied by number of tokens, that's it. And so essentially it doesn't matter if the question being asked
It's like, you know, it's the size of the prediction network, you know, with its 36 layers or 92 layers or whatever it is, multiplied by number of tokens, that's it. And so essentially it doesn't matter if the question being asked
It's like, you know, it's the size of the prediction network, you know, with its 36 layers or 92 layers or whatever it is, multiplied by number of tokens, that's it. And so essentially it doesn't matter if the question being asked