Ilya Shumailov
👤 PersonAppearances Over Time
Podcast Appearances
Large language models are statistical beasts that learn from example of human written text and learn to produce text that is similar to the ones that the model was taught.
Large language models are statistical beasts that learn from example of human written text and learn to produce text that is similar to the ones that the model was taught.
Large language models are statistical beasts that learn from example of human written text and learn to produce text that is similar to the ones that the model was taught.
If you were today to sample data from internet randomly, I'm sure you'll find that a bigger proportion of it is generated by machines. But this is not to say that the data itself is bad. The main question is how much of this data is generated
If you were today to sample data from internet randomly, I'm sure you'll find that a bigger proportion of it is generated by machines. But this is not to say that the data itself is bad. The main question is how much of this data is generated
If you were today to sample data from internet randomly, I'm sure you'll find that a bigger proportion of it is generated by machines. But this is not to say that the data itself is bad. The main question is how much of this data is generated
Quite a lot of these models, especially back at the time, they're relatively low quality. So there are errors and there are biases. There are systematic biases inside of those models. And thus, you can kind of imagine the case where rather than learning useful contexts, and useful concepts, you can actually learn things that don't exist. They are purely hallucinations.
Quite a lot of these models, especially back at the time, they're relatively low quality. So there are errors and there are biases. There are systematic biases inside of those models. And thus, you can kind of imagine the case where rather than learning useful contexts, and useful concepts, you can actually learn things that don't exist. They are purely hallucinations.
Quite a lot of these models, especially back at the time, they're relatively low quality. So there are errors and there are biases. There are systematic biases inside of those models. And thus, you can kind of imagine the case where rather than learning useful contexts, and useful concepts, you can actually learn things that don't exist. They are purely hallucinations.
In simple theoretical setups, we consider it, you're guaranteed to collapse.
In simple theoretical setups, we consider it, you're guaranteed to collapse.
In simple theoretical setups, we consider it, you're guaranteed to collapse.
So there are three sources, three primary sources of error that we still have. So the very first one is basically just data-associated errors. And usually those are questions along the lines of, do we have enough data to approximate a given process? So if some things happen very infrequently in your underlying distribution, your model may get a wrong perception that, like,
So there are three sources, three primary sources of error that we still have. So the very first one is basically just data-associated errors. And usually those are questions along the lines of, do we have enough data to approximate a given process? So if some things happen very infrequently in your underlying distribution, your model may get a wrong perception that, like,
So there are three sources, three primary sources of error that we still have. So the very first one is basically just data-associated errors. And usually those are questions along the lines of, do we have enough data to approximate a given process? So if some things happen very infrequently in your underlying distribution, your model may get a wrong perception that, like,
that some things are impossible wait what do you mean by they are impossible like an example i've seen on twitter was uh if you google for a baby peacock you'll discover pictures of birds that look relatively realistic but they are not peacocks at all they are completely generated and you will not find a real picture but if you try learning anything from it of course you're
that some things are impossible wait what do you mean by they are impossible like an example i've seen on twitter was uh if you google for a baby peacock you'll discover pictures of birds that look relatively realistic but they are not peacocks at all they are completely generated and you will not find a real picture but if you try learning anything from it of course you're
that some things are impossible wait what do you mean by they are impossible like an example i've seen on twitter was uh if you google for a baby peacock you'll discover pictures of birds that look relatively realistic but they are not peacocks at all they are completely generated and you will not find a real picture but if you try learning anything from it of course you're
I've got to be absorbing this bias.
I've got to be absorbing this bias.