Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Pricing

Jonathan Ross

👤 Person
408 total appearances

Appearances Over Time

Podcast Appearances

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch
20VC: Deepseek Special: Is Deepseek a Weapon of the CCP | How Should OpenAI and the US Government Respond | Why $500BN for Stargate is Not Enough | The Future of Inference, NVIDIA and Foundation Models with Jonathan Ross @ Groq

And they'll have some of their own data and that'll make them subtly better at one thing or another. But they're largely all the same. More GPUs, the better the model because you can train on more tokens. It's the scaling law. This model was supposedly trained on a smaller number of GPUs and a much, much tighter budget.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch
20VC: Deepseek Special: Is Deepseek a Weapon of the CCP | How Should OpenAI and the US Government Respond | Why $500BN for Stargate is Not Enough | The Future of Inference, NVIDIA and Foundation Models with Jonathan Ross @ Groq

I think the way that it's been put is less than the salary of many of the executives at Meta, and that's not true. There's an element of marketing involved in the DeepSea release. It is true that they train the model on approximately $6 million for the GPUs, right? They claim 2000

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch
20VC: Deepseek Special: Is Deepseek a Weapon of the CCP | How Should OpenAI and the US Government Respond | Why $500BN for Stargate is Not Enough | The Future of Inference, NVIDIA and Foundation Models with Jonathan Ross @ Groq

I think the way that it's been put is less than the salary of many of the executives at Meta, and that's not true. There's an element of marketing involved in the DeepSea release. It is true that they train the model on approximately $6 million for the GPUs, right? They claim 2000

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch
20VC: Deepseek Special: Is Deepseek a Weapon of the CCP | How Should OpenAI and the US Government Respond | Why $500BN for Stargate is Not Enough | The Future of Inference, NVIDIA and Foundation Models with Jonathan Ross @ Groq

I think the way that it's been put is less than the salary of many of the executives at Meta, and that's not true. There's an element of marketing involved in the DeepSea release. It is true that they train the model on approximately $6 million for the GPUs, right? They claim 2000

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch
20VC: Deepseek Special: Is Deepseek a Weapon of the CCP | How Should OpenAI and the US Government Respond | Why $500BN for Stargate is Not Enough | The Future of Inference, NVIDIA and Foundation Models with Jonathan Ross @ Groq

GPUs for, I think it was 60 days, which by the way, also don't forget was about the same amount of GPU time, 4,000 GPUs for 30 days as the original, I believe Lama 70. Now more recently, Meta has been training on more GPUs, but Meta hasn't been using as much good data as DeepSeq because DeepSeq was doing reinforcement learning using OpenAI.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch
20VC: Deepseek Special: Is Deepseek a Weapon of the CCP | How Should OpenAI and the US Government Respond | Why $500BN for Stargate is Not Enough | The Future of Inference, NVIDIA and Foundation Models with Jonathan Ross @ Groq

GPUs for, I think it was 60 days, which by the way, also don't forget was about the same amount of GPU time, 4,000 GPUs for 30 days as the original, I believe Lama 70. Now more recently, Meta has been training on more GPUs, but Meta hasn't been using as much good data as DeepSeq because DeepSeq was doing reinforcement learning using OpenAI.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch
20VC: Deepseek Special: Is Deepseek a Weapon of the CCP | How Should OpenAI and the US Government Respond | Why $500BN for Stargate is Not Enough | The Future of Inference, NVIDIA and Foundation Models with Jonathan Ross @ Groq

GPUs for, I think it was 60 days, which by the way, also don't forget was about the same amount of GPU time, 4,000 GPUs for 30 days as the original, I believe Lama 70. Now more recently, Meta has been training on more GPUs, but Meta hasn't been using as much good data as DeepSeq because DeepSeq was doing reinforcement learning using OpenAI.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch
20VC: Deepseek Special: Is Deepseek a Weapon of the CCP | How Should OpenAI and the US Government Respond | Why $500BN for Stargate is Not Enough | The Future of Inference, NVIDIA and Foundation Models with Jonathan Ross @ Groq

Yes, exactly.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch
20VC: Deepseek Special: Is Deepseek a Weapon of the CCP | How Should OpenAI and the US Government Respond | Why $500BN for Stargate is Not Enough | The Future of Inference, NVIDIA and Foundation Models with Jonathan Ross @ Groq

Yes, exactly.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch
20VC: Deepseek Special: Is Deepseek a Weapon of the CCP | How Should OpenAI and the US Government Respond | Why $500BN for Stargate is Not Enough | The Future of Inference, NVIDIA and Foundation Models with Jonathan Ross @ Groq

Yes, exactly.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch
20VC: Deepseek Special: Is Deepseek a Weapon of the CCP | How Should OpenAI and the US Government Respond | Why $500BN for Stargate is Not Enough | The Future of Inference, NVIDIA and Foundation Models with Jonathan Ross @ Groq

It's a little bit like speaking to someone who's smarter and getting tutored by someone who's smarter. You actually do better than if you're speaking to someone who's not as knowledgeable about the area or giving you wrong answers. First of all, before we get into any of this, I need to start with the scaling laws. These are like the physics of LLMs.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch
20VC: Deepseek Special: Is Deepseek a Weapon of the CCP | How Should OpenAI and the US Government Respond | Why $500BN for Stargate is Not Enough | The Future of Inference, NVIDIA and Foundation Models with Jonathan Ross @ Groq

It's a little bit like speaking to someone who's smarter and getting tutored by someone who's smarter. You actually do better than if you're speaking to someone who's not as knowledgeable about the area or giving you wrong answers. First of all, before we get into any of this, I need to start with the scaling laws. These are like the physics of LLMs.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch
20VC: Deepseek Special: Is Deepseek a Weapon of the CCP | How Should OpenAI and the US Government Respond | Why $500BN for Stargate is Not Enough | The Future of Inference, NVIDIA and Foundation Models with Jonathan Ross @ Groq

It's a little bit like speaking to someone who's smarter and getting tutored by someone who's smarter. You actually do better than if you're speaking to someone who's not as knowledgeable about the area or giving you wrong answers. First of all, before we get into any of this, I need to start with the scaling laws. These are like the physics of LLMs.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch
20VC: Deepseek Special: Is Deepseek a Weapon of the CCP | How Should OpenAI and the US Government Respond | Why $500BN for Stargate is Not Enough | The Future of Inference, NVIDIA and Foundation Models with Jonathan Ross @ Groq

And there's a particular curve and the more tokens, which are sort of the syllables of an LLM, they don't match up exactly with human syllables, but kind of. So the more tokens that you train on, the better the model gets. But there's sort of these asymptotic returns where it starts trailing off.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch
20VC: Deepseek Special: Is Deepseek a Weapon of the CCP | How Should OpenAI and the US Government Respond | Why $500BN for Stargate is Not Enough | The Future of Inference, NVIDIA and Foundation Models with Jonathan Ross @ Groq

And there's a particular curve and the more tokens, which are sort of the syllables of an LLM, they don't match up exactly with human syllables, but kind of. So the more tokens that you train on, the better the model gets. But there's sort of these asymptotic returns where it starts trailing off.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch
20VC: Deepseek Special: Is Deepseek a Weapon of the CCP | How Should OpenAI and the US Government Respond | Why $500BN for Stargate is Not Enough | The Future of Inference, NVIDIA and Foundation Models with Jonathan Ross @ Groq

And there's a particular curve and the more tokens, which are sort of the syllables of an LLM, they don't match up exactly with human syllables, but kind of. So the more tokens that you train on, the better the model gets. But there's sort of these asymptotic returns where it starts trailing off.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch
20VC: Deepseek Special: Is Deepseek a Weapon of the CCP | How Should OpenAI and the US Government Respond | Why $500BN for Stargate is Not Enough | The Future of Inference, NVIDIA and Foundation Models with Jonathan Ross @ Groq

The thing about the scaling law that everyone forgets, and that's why everyone was talking about how it's like the end of the scaling law, we're out of data on the internet, there's nothing left. What most people don't realize is that assumes that the data quality is uniform. If the data quality is better, then you can actually get away with training on fewer tokens.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch
20VC: Deepseek Special: Is Deepseek a Weapon of the CCP | How Should OpenAI and the US Government Respond | Why $500BN for Stargate is Not Enough | The Future of Inference, NVIDIA and Foundation Models with Jonathan Ross @ Groq

The thing about the scaling law that everyone forgets, and that's why everyone was talking about how it's like the end of the scaling law, we're out of data on the internet, there's nothing left. What most people don't realize is that assumes that the data quality is uniform. If the data quality is better, then you can actually get away with training on fewer tokens.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch
20VC: Deepseek Special: Is Deepseek a Weapon of the CCP | How Should OpenAI and the US Government Respond | Why $500BN for Stargate is Not Enough | The Future of Inference, NVIDIA and Foundation Models with Jonathan Ross @ Groq

The thing about the scaling law that everyone forgets, and that's why everyone was talking about how it's like the end of the scaling law, we're out of data on the internet, there's nothing left. What most people don't realize is that assumes that the data quality is uniform. If the data quality is better, then you can actually get away with training on fewer tokens.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch
20VC: Deepseek Special: Is Deepseek a Weapon of the CCP | How Should OpenAI and the US Government Respond | Why $500BN for Stargate is Not Enough | The Future of Inference, NVIDIA and Foundation Models with Jonathan Ross @ Groq

So going back to my background, one of the fun things that I got to witness, I wasn't directly involved, was AlphaGo. Google beat the world champion, Lee Sedol, in Go. That model was trained on a bunch of existing games. But later on, they created a new one called AlphaGo Zero, which was trained on no existing games. It just played against itself. So how do you play against yourself and win?