Jonathan Ross
๐ค SpeakerAppearances Over Time
Podcast Appearances
No problem. But before we start, can I just say one thing? I think you have the most amazing, unique go-to-market that I've ever seen in my life for a podcast. I've never seen this before. I think your strategy is you're literally interviewing every single audience member, forcing them to watch videos and get addicted to you.
No problem. But before we start, can I just say one thing? I think you have the most amazing, unique go-to-market that I've ever seen in my life for a podcast. I've never seen this before. I think your strategy is you're literally interviewing every single audience member, forcing them to watch videos and get addicted to you.
No problem. But before we start, can I just say one thing? I think you have the most amazing, unique go-to-market that I've ever seen in my life for a podcast. I've never seen this before. I think your strategy is you're literally interviewing every single audience member, forcing them to watch videos and get addicted to you.
Well, my background, so I started the Google TPU, the AI chip that Google uses, and in 2016 started an AI chip startup called Grok with a Q, not with a K, that builds AI accelerator chips, which we call LPUs.
Well, my background, so I started the Google TPU, the AI chip that Google uses, and in 2016 started an AI chip startup called Grok with a Q, not with a K, that builds AI accelerator chips, which we call LPUs.
Well, my background, so I started the Google TPU, the AI chip that Google uses, and in 2016 started an AI chip startup called Grok with a Q, not with a K, that builds AI accelerator chips, which we call LPUs.
Yes, it's Sputnik. It is Sputnik 2.0. Even more so, you know that story about how NASA spent a million dollars designing a pen that could write in space and the Russians brought a pencil. That just happened again. So it's a huge deal. Why is it such a huge deal? So up until recently, the Chinese models have been behind sort of Western models.
Yes, it's Sputnik. It is Sputnik 2.0. Even more so, you know that story about how NASA spent a million dollars designing a pen that could write in space and the Russians brought a pencil. That just happened again. So it's a huge deal. Why is it such a huge deal? So up until recently, the Chinese models have been behind sort of Western models.
Yes, it's Sputnik. It is Sputnik 2.0. Even more so, you know that story about how NASA spent a million dollars designing a pen that could write in space and the Russians brought a pencil. That just happened again. So it's a huge deal. Why is it such a huge deal? So up until recently, the Chinese models have been behind sort of Western models.
And I say Western, including like Mistral as well and some other companies. And it was largely focused on how much compute you could get. Most people actually don't realize this. Most companies have access to roughly the same amount of data. They buy them from the same data providers and then just churn through that data with a GPU and they produce a model and then they deploy it.
And I say Western, including like Mistral as well and some other companies. And it was largely focused on how much compute you could get. Most people actually don't realize this. Most companies have access to roughly the same amount of data. They buy them from the same data providers and then just churn through that data with a GPU and they produce a model and then they deploy it.
And I say Western, including like Mistral as well and some other companies. And it was largely focused on how much compute you could get. Most people actually don't realize this. Most companies have access to roughly the same amount of data. They buy them from the same data providers and then just churn through that data with a GPU and they produce a model and then they deploy it.
And they'll have some of their own data and that'll make them subtly better at one thing or another. But they're largely all the same. More GPUs, the better the model because you can train on more tokens. It's the scaling law. This model was supposedly trained on a smaller number of GPUs and a much, much tighter budget.
And they'll have some of their own data and that'll make them subtly better at one thing or another. But they're largely all the same. More GPUs, the better the model because you can train on more tokens. It's the scaling law. This model was supposedly trained on a smaller number of GPUs and a much, much tighter budget.
And they'll have some of their own data and that'll make them subtly better at one thing or another. But they're largely all the same. More GPUs, the better the model because you can train on more tokens. It's the scaling law. This model was supposedly trained on a smaller number of GPUs and a much, much tighter budget.
I think the way that it's been put is less than the salary of many of the executives at Meta, and that's not true. There's an element of marketing involved in the DeepSea release. It is true that they train the model on approximately $6 million for the GPUs, right? They claim 2000
I think the way that it's been put is less than the salary of many of the executives at Meta, and that's not true. There's an element of marketing involved in the DeepSea release. It is true that they train the model on approximately $6 million for the GPUs, right? They claim 2000
I think the way that it's been put is less than the salary of many of the executives at Meta, and that's not true. There's an element of marketing involved in the DeepSea release. It is true that they train the model on approximately $6 million for the GPUs, right? They claim 2000
GPUs for, I think it was 60 days, which by the way, also don't forget was about the same amount of GPU time, 4,000 GPUs for 30 days as the original, I believe Lama 70. Now more recently, Meta has been training on more GPUs, but Meta hasn't been using as much good data as DeepSeq because DeepSeq was doing reinforcement learning using OpenAI.
GPUs for, I think it was 60 days, which by the way, also don't forget was about the same amount of GPU time, 4,000 GPUs for 30 days as the original, I believe Lama 70. Now more recently, Meta has been training on more GPUs, but Meta hasn't been using as much good data as DeepSeq because DeepSeq was doing reinforcement learning using OpenAI.