Arvind Narayanan

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: AI Scaling Myths: More Compute is not the Answer | The Core Bottlenecks in AI Today: Data, Algorithms and Compute | The Future of Models: Open vs Closed, Small vs Large with Arvind Narayanan, Professor of Computer Science @ Princeton

So like the kind of shock that the AI community had when I think back in the day, I think it was GPT-2,

589.57 View full episode →

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: AI Scaling Myths: More Compute is not the Answer | The Core Bottlenecks in AI Today: Data, Algorithms and Compute | The Future of Models: Open vs Closed, Small vs Large with Arvind Narayanan, Professor of Computer Science @ Princeton

was trained primarily on English text, and they had actually tried to filter out text in other languages to keep it clean, but a tiny amount of text from other languages had gotten into it, and it turned out that that was enough for the model to pick up a reasonable level of competence for conversing in various other languages.

595.191 View full episode →

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: AI Scaling Myths: More Compute is not the Answer | The Core Bottlenecks in AI Today: Data, Algorithms and Compute | The Future of Models: Open vs Closed, Small vs Large with Arvind Narayanan, Professor of Computer Science @ Princeton

was trained primarily on English text, and they had actually tried to filter out text in other languages to keep it clean, but a tiny amount of text from other languages had gotten into it, and it turned out that that was enough for the model to pick up a reasonable level of competence for conversing in various other languages.

595.191 View full episode →

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: AI Scaling Myths: More Compute is not the Answer | The Core Bottlenecks in AI Today: Data, Algorithms and Compute | The Future of Models: Open vs Closed, Small vs Large with Arvind Narayanan, Professor of Computer Science @ Princeton

was trained primarily on English text, and they had actually tried to filter out text in other languages to keep it clean, but a tiny amount of text from other languages had gotten into it, and it turned out that that was enough for the model to pick up a reasonable level of competence for conversing in various other languages.

595.191 View full episode →

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: AI Scaling Myths: More Compute is not the Answer | The Core Bottlenecks in AI Today: Data, Algorithms and Compute | The Future of Models: Open vs Closed, Small vs Large with Arvind Narayanan, Professor of Computer Science @ Princeton

So these are the kinds of emergent capabilities that really spooked people, that has led to both a lot of hype and a lot of fears about what bigger and bigger models are going to be able to do. But I think that has pretty much run out because we're training on all of the capabilities that humans have expressed, like translating between languages, and have already put out there in the form of text.

610.466 View full episode →

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: AI Scaling Myths: More Compute is not the Answer | The Core Bottlenecks in AI Today: Data, Algorithms and Compute | The Future of Models: Open vs Closed, Small vs Large with Arvind Narayanan, Professor of Computer Science @ Princeton

So these are the kinds of emergent capabilities that really spooked people, that has led to both a lot of hype and a lot of fears about what bigger and bigger models are going to be able to do. But I think that has pretty much run out because we're training on all of the capabilities that humans have expressed, like translating between languages, and have already put out there in the form of text.

610.466 View full episode →

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: AI Scaling Myths: More Compute is not the Answer | The Core Bottlenecks in AI Today: Data, Algorithms and Compute | The Future of Models: Open vs Closed, Small vs Large with Arvind Narayanan, Professor of Computer Science @ Princeton

So these are the kinds of emergent capabilities that really spooked people, that has led to both a lot of hype and a lot of fears about what bigger and bigger models are going to be able to do. But I think that has pretty much run out because we're training on all of the capabilities that humans have expressed, like translating between languages, and have already put out there in the form of text.

610.466 View full episode →

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: AI Scaling Myths: More Compute is not the Answer | The Core Bottlenecks in AI Today: Data, Algorithms and Compute | The Future of Models: Open vs Closed, Small vs Large with Arvind Narayanan, Professor of Computer Science @ Princeton

So if you make the data set a little bit more diverse with YouTube video, I don't think that's fundamentally going to change. Multimodal capabilities, yes, there's a lot of room there. But new, emergent text capabilities, I'm not sure. MARK BLYTH What about synthetic data?

631.721 View full episode →

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: AI Scaling Myths: More Compute is not the Answer | The Core Bottlenecks in AI Today: Data, Algorithms and Compute | The Future of Models: Open vs Closed, Small vs Large with Arvind Narayanan, Professor of Computer Science @ Princeton

So if you make the data set a little bit more diverse with YouTube video, I don't think that's fundamentally going to change. Multimodal capabilities, yes, there's a lot of room there. But new, emergent text capabilities, I'm not sure. MARK BLYTH What about synthetic data?

631.721 View full episode →

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: AI Scaling Myths: More Compute is not the Answer | The Core Bottlenecks in AI Today: Data, Algorithms and Compute | The Future of Models: Open vs Closed, Small vs Large with Arvind Narayanan, Professor of Computer Science @ Princeton

So if you make the data set a little bit more diverse with YouTube video, I don't think that's fundamentally going to change. Multimodal capabilities, yes, there's a lot of room there. But new, emergent text capabilities, I'm not sure. MARK BLYTH What about synthetic data?

631.721 View full episode →

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: AI Scaling Myths: More Compute is not the Answer | The Core Bottlenecks in AI Today: Data, Algorithms and Compute | The Future of Models: Open vs Closed, Small vs Large with Arvind Narayanan, Professor of Computer Science @ Princeton

Yeah, let's talk about synthetic data. So there's two ways to look at this, right? So one is the way in which synthetic data is being used today, which is not to increase the volume of training data, but it's actually to overcome limitations in the quality of the training data that we do have.

647.085 View full episode →

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: AI Scaling Myths: More Compute is not the Answer | The Core Bottlenecks in AI Today: Data, Algorithms and Compute | The Future of Models: Open vs Closed, Small vs Large with Arvind Narayanan, Professor of Computer Science @ Princeton

Yeah, let's talk about synthetic data. So there's two ways to look at this, right? So one is the way in which synthetic data is being used today, which is not to increase the volume of training data, but it's actually to overcome limitations in the quality of the training data that we do have.

647.085 View full episode →

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: AI Scaling Myths: More Compute is not the Answer | The Core Bottlenecks in AI Today: Data, Algorithms and Compute | The Future of Models: Open vs Closed, Small vs Large with Arvind Narayanan, Professor of Computer Science @ Princeton

Yeah, let's talk about synthetic data. So there's two ways to look at this, right? So one is the way in which synthetic data is being used today, which is not to increase the volume of training data, but it's actually to overcome limitations in the quality of the training data that we do have.

647.085 View full episode →

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: AI Scaling Myths: More Compute is not the Answer | The Core Bottlenecks in AI Today: Data, Algorithms and Compute | The Future of Models: Open vs Closed, Small vs Large with Arvind Narayanan, Professor of Computer Science @ Princeton

So for instance, if in a particular language, there's too little data, you can try to augment that, or you can try to have a model, you know, solve a bunch of mathematical equations, throw that into the training data. And so for the next training run, that's going to be part of the pre training. And so the model will get better at doing that.

665.539 View full episode →

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: AI Scaling Myths: More Compute is not the Answer | The Core Bottlenecks in AI Today: Data, Algorithms and Compute | The Future of Models: Open vs Closed, Small vs Large with Arvind Narayanan, Professor of Computer Science @ Princeton

So for instance, if in a particular language, there's too little data, you can try to augment that, or you can try to have a model, you know, solve a bunch of mathematical equations, throw that into the training data. And so for the next training run, that's going to be part of the pre training. And so the model will get better at doing that.

665.539 View full episode →

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: AI Scaling Myths: More Compute is not the Answer | The Core Bottlenecks in AI Today: Data, Algorithms and Compute | The Future of Models: Open vs Closed, Small vs Large with Arvind Narayanan, Professor of Computer Science @ Princeton

So for instance, if in a particular language, there's too little data, you can try to augment that, or you can try to have a model, you know, solve a bunch of mathematical equations, throw that into the training data. And so for the next training run, that's going to be part of the pre training. And so the model will get better at doing that.

665.539 View full episode →

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: AI Scaling Myths: More Compute is not the Answer | The Core Bottlenecks in AI Today: Data, Algorithms and Compute | The Future of Models: Open vs Closed, Small vs Large with Arvind Narayanan, Professor of Computer Science @ Princeton

And the other way to look at synthetic data is, okay, you take 1 trillion tokens, you train a model on it, and then you output 10 trillion tokens, so you get to the next bigger model, and then you use that to output 100 trillion tokens. I'll bet that that's just not going to happen. That's just a snake eating its own tail, and...

684.791 View full episode →

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: AI Scaling Myths: More Compute is not the Answer | The Core Bottlenecks in AI Today: Data, Algorithms and Compute | The Future of Models: Open vs Closed, Small vs Large with Arvind Narayanan, Professor of Computer Science @ Princeton

And the other way to look at synthetic data is, okay, you take 1 trillion tokens, you train a model on it, and then you output 10 trillion tokens, so you get to the next bigger model, and then you use that to output 100 trillion tokens. I'll bet that that's just not going to happen. That's just a snake eating its own tail, and...

684.791 View full episode →

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: AI Scaling Myths: More Compute is not the Answer | The Core Bottlenecks in AI Today: Data, Algorithms and Compute | The Future of Models: Open vs Closed, Small vs Large with Arvind Narayanan, Professor of Computer Science @ Princeton

And the other way to look at synthetic data is, okay, you take 1 trillion tokens, you train a model on it, and then you output 10 trillion tokens, so you get to the next bigger model, and then you use that to output 100 trillion tokens. I'll bet that that's just not going to happen. That's just a snake eating its own tail, and...

684.791 View full episode →

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: AI Scaling Myths: More Compute is not the Answer | The Core Bottlenecks in AI Today: Data, Algorithms and Compute | The Future of Models: Open vs Closed, Small vs Large with Arvind Narayanan, Professor of Computer Science @ Princeton

What we've learned in the last two years is that the quality of data matters a lot more than the quantity of data. So if you're using synthetic data to try to augment the quantity, I think it's just coming at the expense of quality. You're not learning new things from the data. You're only learning things that are already there.

700.828 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment