Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Nathan Lambert

๐Ÿ‘ค Speaker
1665 total appearances

Appearances Over Time

Podcast Appearances

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

I want to train there because that's where all of my GPUs are co-located, where I can put them at a super high networking speed connected together, right? Because that's what you need for training. Now with pre-training, this is the old scale, right? You could, you would increase parameters. You didn't increase data model gets better, right?

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

I want to train there because that's where all of my GPUs are co-located, where I can put them at a super high networking speed connected together, right? Because that's what you need for training. Now with pre-training, this is the old scale, right? You could, you would increase parameters. You didn't increase data model gets better, right?

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

That doesn't apply anymore because there's not much more data in the pre-training side. Yes, there's video and audio and image that has not been fully taken advantage of. So there's a lot more scaling. But a lot of people have taken transcripts of YouTube videos. And that gets you a lot of the data. It doesn't get you all the learning value out of the video and image data.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

That doesn't apply anymore because there's not much more data in the pre-training side. Yes, there's video and audio and image that has not been fully taken advantage of. So there's a lot more scaling. But a lot of people have taken transcripts of YouTube videos. And that gets you a lot of the data. It doesn't get you all the learning value out of the video and image data.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

That doesn't apply anymore because there's not much more data in the pre-training side. Yes, there's video and audio and image that has not been fully taken advantage of. So there's a lot more scaling. But a lot of people have taken transcripts of YouTube videos. And that gets you a lot of the data. It doesn't get you all the learning value out of the video and image data.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

But there's still scaling to be done on pre-training. But this post-training world is where all the flops are going to be spent, right? The model is going to play with itself. It's going to self-play. It's going to do verifiable tasks. It's going to do computer use in sandboxes. It might even do like simulated robotics things, right?

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

But there's still scaling to be done on pre-training. But this post-training world is where all the flops are going to be spent, right? The model is going to play with itself. It's going to self-play. It's going to do verifiable tasks. It's going to do computer use in sandboxes. It might even do like simulated robotics things, right?

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

But there's still scaling to be done on pre-training. But this post-training world is where all the flops are going to be spent, right? The model is going to play with itself. It's going to self-play. It's going to do verifiable tasks. It's going to do computer use in sandboxes. It might even do like simulated robotics things, right?

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Like all of these things are going to be environments where compute is spent in quote unquote post-training. But I think it's going to be good. We're going to drop the post from post-training. It's going to be pre-training and it's going to be training, I think. At some point. Because for the bulk of the last few years, pre-training has dwarfed post-training.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Like all of these things are going to be environments where compute is spent in quote unquote post-training. But I think it's going to be good. We're going to drop the post from post-training. It's going to be pre-training and it's going to be training, I think. At some point. Because for the bulk of the last few years, pre-training has dwarfed post-training.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Like all of these things are going to be environments where compute is spent in quote unquote post-training. But I think it's going to be good. We're going to drop the post from post-training. It's going to be pre-training and it's going to be training, I think. At some point. Because for the bulk of the last few years, pre-training has dwarfed post-training.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

But with these verifiable methods, especially ones that scale really potentially infinitely, like computer use and robotics, not just math and coding, where you can verify what's happening, those infinitely verifiable tasks, it seems you can spend as much compute as you want on them. Especially at the context length increase context.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

But with these verifiable methods, especially ones that scale really potentially infinitely, like computer use and robotics, not just math and coding, where you can verify what's happening, those infinitely verifiable tasks, it seems you can spend as much compute as you want on them. Especially at the context length increase context.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

But with these verifiable methods, especially ones that scale really potentially infinitely, like computer use and robotics, not just math and coding, where you can verify what's happening, those infinitely verifiable tasks, it seems you can spend as much compute as you want on them. Especially at the context length increase context.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

I was like, huh?

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

I was like, huh?

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

I was like, huh?

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

TPU is awesome, right? It's great. Google is... They're a bit more tepid on building data centers for some reason. They're building big data centers, don't get me wrong. And they actually have the biggest cluster. I was talking about NVIDIA clusters. They actually have the biggest cluster, period. But the way they do it is very interesting, right? They have two data center...

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

TPU is awesome, right? It's great. Google is... They're a bit more tepid on building data centers for some reason. They're building big data centers, don't get me wrong. And they actually have the biggest cluster. I was talking about NVIDIA clusters. They actually have the biggest cluster, period. But the way they do it is very interesting, right? They have two data center...

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

TPU is awesome, right? It's great. Google is... They're a bit more tepid on building data centers for some reason. They're building big data centers, don't get me wrong. And they actually have the biggest cluster. I was talking about NVIDIA clusters. They actually have the biggest cluster, period. But the way they do it is very interesting, right? They have two data center...