Arvind Narayanan
👤 PersonAppearances Over Time
Podcast Appearances
So I think that's running out. We're not going to have too many more cycles, possibly zero more cycles of a model that's almost an order of magnitude bigger in terms of the number of parameters than what came before and thereby more powerful. And I think a reason for that is data becoming a bottleneck.
So I think that's running out. We're not going to have too many more cycles, possibly zero more cycles of a model that's almost an order of magnitude bigger in terms of the number of parameters than what came before and thereby more powerful. And I think a reason for that is data becoming a bottleneck.
So I think that's running out. We're not going to have too many more cycles, possibly zero more cycles of a model that's almost an order of magnitude bigger in terms of the number of parameters than what came before and thereby more powerful. And I think a reason for that is data becoming a bottleneck.
These models are already trained on essentially all of the data that companies can get their hands on. So while data is becoming a bottleneck, I think more compute still helps, but maybe not as much as it used to. And the reason for that is that perhaps ironically, more compute allows one to build smaller models with the same capability level.
These models are already trained on essentially all of the data that companies can get their hands on. So while data is becoming a bottleneck, I think more compute still helps, but maybe not as much as it used to. And the reason for that is that perhaps ironically, more compute allows one to build smaller models with the same capability level.
These models are already trained on essentially all of the data that companies can get their hands on. So while data is becoming a bottleneck, I think more compute still helps, but maybe not as much as it used to. And the reason for that is that perhaps ironically, more compute allows one to build smaller models with the same capability level.
And that's actually the trend we've been seeing over the last year or so. As you know, you know, the models today have gotten somewhat smaller and cheaper than when GPT-4 initially came out, but with the same capability level. So I think that's probably going to continue. Are we going to see a GPT-5 that's as big a leap over GPT-4 as GPT-4 was over GPT-3? I'm frankly skeptical.
And that's actually the trend we've been seeing over the last year or so. As you know, you know, the models today have gotten somewhat smaller and cheaper than when GPT-4 initially came out, but with the same capability level. So I think that's probably going to continue. Are we going to see a GPT-5 that's as big a leap over GPT-4 as GPT-4 was over GPT-3? I'm frankly skeptical.
And that's actually the trend we've been seeing over the last year or so. As you know, you know, the models today have gotten somewhat smaller and cheaper than when GPT-4 initially came out, but with the same capability level. So I think that's probably going to continue. Are we going to see a GPT-5 that's as big a leap over GPT-4 as GPT-4 was over GPT-3? I'm frankly skeptical.
Right. So there are a lot of sources that haven't been mined yet. But when we start to look at the volume of that data, how many tokens is that? I think the picture is a little bit different. 150 billion hours of video sounds really impressive.
Right. So there are a lot of sources that haven't been mined yet. But when we start to look at the volume of that data, how many tokens is that? I think the picture is a little bit different. 150 billion hours of video sounds really impressive.
Right. So there are a lot of sources that haven't been mined yet. But when we start to look at the volume of that data, how many tokens is that? I think the picture is a little bit different. 150 billion hours of video sounds really impressive.
But when you put that video through a speech recognizer and actually extract the text tokens out of it and deduplicate it and so forth, it's actually not that much. It's an order of magnitude smaller than than what some of the largest models today have already been trained with.
But when you put that video through a speech recognizer and actually extract the text tokens out of it and deduplicate it and so forth, it's actually not that much. It's an order of magnitude smaller than than what some of the largest models today have already been trained with.
But when you put that video through a speech recognizer and actually extract the text tokens out of it and deduplicate it and so forth, it's actually not that much. It's an order of magnitude smaller than than what some of the largest models today have already been trained with.
Now training on video itself, instead of text extracted from the video, that could lead to some new capabilities, but not in the same fundamental way that we've had before, where you have the emergence of new capabilities, right? Models being able to do things that people just weren't anticipating.
Now training on video itself, instead of text extracted from the video, that could lead to some new capabilities, but not in the same fundamental way that we've had before, where you have the emergence of new capabilities, right? Models being able to do things that people just weren't anticipating.
Now training on video itself, instead of text extracted from the video, that could lead to some new capabilities, but not in the same fundamental way that we've had before, where you have the emergence of new capabilities, right? Models being able to do things that people just weren't anticipating.
So like the kind of shock that the AI community had when I think back in the day, I think it was GPT-2,
So like the kind of shock that the AI community had when I think back in the day, I think it was GPT-2,