Regina Barber
๐ค SpeakerAppearances Over Time
Podcast Appearances
Ilya and his team did this research study indicating that eventually, any large language model that learns from its own synthetic data would start to degrade over time, producing results that got worse and worse and worse.
So today on the show, AI model collapse. What happens when a large language model reads too much of its own content? And could it limit the future of generative AI? I'm Regina Barber, and you're listening to ShoreWave, the science podcast from NPR.
So today on the show, AI model collapse. What happens when a large language model reads too much of its own content? And could it limit the future of generative AI? I'm Regina Barber, and you're listening to ShoreWave, the science podcast from NPR.
So today on the show, AI model collapse. What happens when a large language model reads too much of its own content? And could it limit the future of generative AI? I'm Regina Barber, and you're listening to ShoreWave, the science podcast from NPR.
OK, Ilya, before we get into the big problem of like model collapse, I think we need to understand why these errors are actually happening. So can you explain to me what kinds of errors do you get from a large language model and like how do they happen? Why do they happen?
OK, Ilya, before we get into the big problem of like model collapse, I think we need to understand why these errors are actually happening. So can you explain to me what kinds of errors do you get from a large language model and like how do they happen? Why do they happen?
OK, Ilya, before we get into the big problem of like model collapse, I think we need to understand why these errors are actually happening. So can you explain to me what kinds of errors do you get from a large language model and like how do they happen? Why do they happen?
Right. You're like telling me now that there's like a lot of fake baby peacock images, but machines don't know that. Right. They're just going to think, great, this is a baby peacock. And also there's not that many like real baby peacock images to compare it to.
Right. You're like telling me now that there's like a lot of fake baby peacock images, but machines don't know that. Right. They're just going to think, great, this is a baby peacock. And also there's not that many like real baby peacock images to compare it to.
Right. You're like telling me now that there's like a lot of fake baby peacock images, but machines don't know that. Right. They're just going to think, great, this is a baby peacock. And also there's not that many like real baby peacock images to compare it to.
Not just babies, but baby birds. Not just baby birds, but baby peacocks.
Not just babies, but baby birds. Not just baby birds, but baby peacocks.
Not just babies, but baby birds. Not just baby birds, but baby peacocks.
Okay, so that's one kind of problem, a data problem. What are the other two?
Okay, so that's one kind of problem, a data problem. What are the other two?
Okay, so that's one kind of problem, a data problem. What are the other two?
It's like a black box. We don't know how it's making these decisions. We don't know where, like you said, in that order, it's fixing those decisions. Yeah.
It's like a black box. We don't know how it's making these decisions. We don't know where, like you said, in that order, it's fixing those decisions. Yeah.
It's like a black box. We don't know how it's making these decisions. We don't know where, like you said, in that order, it's fixing those decisions. Yeah.
Right, right. Okay, so the three places errors could come from is like, one, the model itself, two, the way it's trained, right? And three, the data or the lack of data that it's trained on.