Ilya Shumailov

When AI Cannibalizes Its Data

Exactly. And then we also have empirical errors from, for example, hardware. So we also have practical limitations of hardware with which we work. And those errors also exist.

425.181 View full episode →

Short Wave

When AI Cannibalizes Its Data

Exactly. And then we also have empirical errors from, for example, hardware. So we also have practical limitations of hardware with which we work. And those errors also exist.

425.181 View full episode →

Short Wave

When AI Cannibalizes Its Data

Yes, certainly. So what we observe in simple theoretical models is that two main phenomena happen. The very first phenomena that happens is it's really hard to approximate improbable events, in part because you don't encounter them very often. So you may discover that you're collecting more and more data, and a lot of this data looks very similar to what you already possess.

447.526 View full episode →

Short Wave

When AI Cannibalizes Its Data

Yes, certainly. So what we observe in simple theoretical models is that two main phenomena happen. The very first phenomena that happens is it's really hard to approximate improbable events, in part because you don't encounter them very often. So you may discover that you're collecting more and more data, and a lot of this data looks very similar to what you already possess.

447.526 View full episode →

Short Wave

When AI Cannibalizes Its Data

Yes, certainly. So what we observe in simple theoretical models is that two main phenomena happen. The very first phenomena that happens is it's really hard to approximate improbable events, in part because you don't encounter them very often. So you may discover that you're collecting more and more data, and a lot of this data looks very similar to what you already possess.

447.526 View full episode →

Short Wave

When AI Cannibalizes Its Data

So you're not discovering too much information. But importantly, you're not discovering those infrequent data points. So those tail events, they kind of disappear. And then the other thing that happens is that the first time you made this error and underestimated your improbable events... When you hit the model on top of this, it's unlikely to recover from this taking place.

470.953 View full episode →

Short Wave

When AI Cannibalizes Its Data

So you're not discovering too much information. But importantly, you're not discovering those infrequent data points. So those tail events, they kind of disappear. And then the other thing that happens is that the first time you made this error and underestimated your improbable events... When you hit the model on top of this, it's unlikely to recover from this taking place.

470.953 View full episode →

Short Wave

When AI Cannibalizes Its Data

So you're not discovering too much information. But importantly, you're not discovering those infrequent data points. So those tail events, they kind of disappear. And then the other thing that happens is that the first time you made this error and underestimated your improbable events... When you hit the model on top of this, it's unlikely to recover from this taking place.

470.953 View full episode →

Short Wave

When AI Cannibalizes Its Data

Originally improbable events are even more improbable for the subsequent model, and it kind of like snowballs out of control until the whole thing just collapses fully to near zero variance.

506.331 View full episode →

Short Wave

When AI Cannibalizes Its Data

Originally improbable events are even more improbable for the subsequent model, and it kind of like snowballs out of control until the whole thing just collapses fully to near zero variance.

506.331 View full episode →

Short Wave

When AI Cannibalizes Its Data

Originally improbable events are even more improbable for the subsequent model, and it kind of like snowballs out of control until the whole thing just collapses fully to near zero variance.

506.331 View full episode →

Short Wave

When AI Cannibalizes Its Data

Exactly. And the thing is, you can theoretically describe this. It's actually very simple. And you can run these experiments however many times you want. And you'll discover that even if you have a lot of data, if you keep on repeating this process, and the rate at which this collapses, you can also bound, you end up always in a state where your improbable events kind of disappear.

523.446 View full episode →

Short Wave

When AI Cannibalizes Its Data

Exactly. And the thing is, you can theoretically describe this. It's actually very simple. And you can run these experiments however many times you want. And you'll discover that even if you have a lot of data, if you keep on repeating this process, and the rate at which this collapses, you can also bound, you end up always in a state where your improbable events kind of disappear.

523.446 View full episode →

Short Wave

When AI Cannibalizes Its Data

Exactly. And the thing is, you can theoretically describe this. It's actually very simple. And you can run these experiments however many times you want. And you'll discover that even if you have a lot of data, if you keep on repeating this process, and the rate at which this collapses, you can also bound, you end up always in a state where your improbable events kind of disappear.

523.446 View full episode →

Short Wave

When AI Cannibalizes Its Data

In practice, when we grab large language models, we observe that they become more confident in the predictions that they are making. So basically, the improbable events here are going to be things that the model is not very confident about, and normally it would not make predictions about it.

545.469 View full episode →

Short Wave

When AI Cannibalizes Its Data

In practice, when we grab large language models, we observe that they become more confident in the predictions that they are making. So basically, the improbable events here are going to be things that the model is not very confident about, and normally it would not make predictions about it.

545.469 View full episode →

Short Wave

When AI Cannibalizes Its Data

In practice, when we grab large language models, we observe that they become more confident in the predictions that they are making. So basically, the improbable events here are going to be things that the model is not very confident about, and normally it would not make predictions about it.

545.469 View full episode →

Short Wave

When AI Cannibalizes Its Data

So when you're trying to generate more data out of a language model in order for another language model to learn from it, over time, basically, it becomes more and more confident. And then it basically, during the generation setup, it gets stuck very often in these repetitive loops.

563.465 View full episode →

Short Wave

When AI Cannibalizes Its Data

So when you're trying to generate more data out of a language model in order for another language model to learn from it, over time, basically, it becomes more and more confident. And then it basically, during the generation setup, it gets stuck very often in these repetitive loops.

563.465 View full episode →

Short Wave

When AI Cannibalizes Its Data

So when you're trying to generate more data out of a language model in order for another language model to learn from it, over time, basically, it becomes more and more confident. And then it basically, during the generation setup, it gets stuck very often in these repetitive loops.

563.465 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment