Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Nathan Lambert

๐Ÿ‘ค Speaker
1665 total appearances

Appearances Over Time

Podcast Appearances

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

To give context, right? Everyone, one of the parts of like freaking this out was like trying to reach the capabilities. The other aspect is they did it so cheap, right? And the so cheap, we kind of talked about on the training side, why it was so cheap.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

To give context, right? Everyone, one of the parts of like freaking this out was like trying to reach the capabilities. The other aspect is they did it so cheap, right? And the so cheap, we kind of talked about on the training side, why it was so cheap.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

To give context, right? Everyone, one of the parts of like freaking this out was like trying to reach the capabilities. The other aspect is they did it so cheap, right? And the so cheap, we kind of talked about on the training side, why it was so cheap.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

So I think there's a couple factors here, right? One is that they do have model architecture innovations, right? This MLA, this new attention that they've done is different than the attention from attention is all you need to transform our attention, right? Now, others have already innovated.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

So I think there's a couple factors here, right? One is that they do have model architecture innovations, right? This MLA, this new attention that they've done is different than the attention from attention is all you need to transform our attention, right? Now, others have already innovated.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

So I think there's a couple factors here, right? One is that they do have model architecture innovations, right? This MLA, this new attention that they've done is different than the attention from attention is all you need to transform our attention, right? Now, others have already innovated.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

There's a lot of work like MQA, GQA, local, global, all these different innovations that like try to bend the curve, right? It's still quadratic, but the constant is now smaller, right?

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

There's a lot of work like MQA, GQA, local, global, all these different innovations that like try to bend the curve, right? It's still quadratic, but the constant is now smaller, right?

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

There's a lot of work like MQA, GQA, local, global, all these different innovations that like try to bend the curve, right? It's still quadratic, but the constant is now smaller, right?

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

It's 80% to 90% versus the original, but then versus what people are actually doing. It's still an innovation.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

It's 80% to 90% versus the original, but then versus what people are actually doing. It's still an innovation.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

It's 80% to 90% versus the original, but then versus what people are actually doing. It's still an innovation.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Well, and not just that, right? Like other people have implemented techniques like local-global and sliding window and GQMQA. But anyways, like DeepSeq has their attention mechanism as a true architectural innovation. They did tons of experimentation. And this dramatically reduces the memory pressure. It's still there, right? It's still attention. It's still quadratic.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Well, and not just that, right? Like other people have implemented techniques like local-global and sliding window and GQMQA. But anyways, like DeepSeq has their attention mechanism as a true architectural innovation. They did tons of experimentation. And this dramatically reduces the memory pressure. It's still there, right? It's still attention. It's still quadratic.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Well, and not just that, right? Like other people have implemented techniques like local-global and sliding window and GQMQA. But anyways, like DeepSeq has their attention mechanism as a true architectural innovation. They did tons of experimentation. And this dramatically reduces the memory pressure. It's still there, right? It's still attention. It's still quadratic.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

It's just dramatically reduced it relative to prior forms.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

It's just dramatically reduced it relative to prior forms.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

It's just dramatically reduced it relative to prior forms.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

So I think this is very important, right? OpenAI is, you know, that drastic gap between DeepSeek and pricing. But DeepSeek is offering the same model because they open-weighted it to everyone else for a very similar, like much lower price than what others are able to serve it for.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

So I think this is very important, right? OpenAI is, you know, that drastic gap between DeepSeek and pricing. But DeepSeek is offering the same model because they open-weighted it to everyone else for a very similar, like much lower price than what others are able to serve it for.