Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Pricing

Nathan Lambert

๐Ÿ‘ค Person
1665 total appearances

Appearances Over Time

Podcast Appearances

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

So it's like there's a distribution. The whole idea of grokking also comes in, right? It's like just because it slowed down from improving and loss doesn't mean it's not learning because all of a sudden it could be like this and it could just spike down and loss again because it learned, truly learned something, right? And it took some time for it to learn that.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

So it's like there's a distribution. The whole idea of grokking also comes in, right? It's like just because it slowed down from improving and loss doesn't mean it's not learning because all of a sudden it could be like this and it could just spike down and loss again because it learned, truly learned something, right? And it took some time for it to learn that.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

It's not like a gradual process, right? And that's what humans are like. That's what models are like. So it's really a stressful task, as you mentioned.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

It's not like a gradual process, right? And that's what humans are like. That's what models are like. So it's really a stressful task, as you mentioned.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

It's not like a gradual process, right? And that's what humans are like. That's what models are like. So it's really a stressful task, as you mentioned.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

There's the concept of a YOLO run. So YOLO, you only live once. And what it is, is like, you know, there's all this experimentation you do at the small scale, right? Research ablations, right? Like you have your Jupyter notebook where you're experimenting with MLA on like three GPUs or whatever. And you're doing all these different

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

There's the concept of a YOLO run. So YOLO, you only live once. And what it is, is like, you know, there's all this experimentation you do at the small scale, right? Research ablations, right? Like you have your Jupyter notebook where you're experimenting with MLA on like three GPUs or whatever. And you're doing all these different

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

There's the concept of a YOLO run. So YOLO, you only live once. And what it is, is like, you know, there's all this experimentation you do at the small scale, right? Research ablations, right? Like you have your Jupyter notebook where you're experimenting with MLA on like three GPUs or whatever. And you're doing all these different

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

uh things like hey do i do four expert four active experts 128 experts do i arrange the experts this way you know all these different uh model architecture things you're testing at a very small scale right couple researchers few gpus tens of gpus hundreds of gpus whatever it is and then all of a sudden you're like okay guys no more no more fucking around right uh no more screwing around everyone take all the resources we have let's pick what we think will work and just go for it right yolo

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

uh things like hey do i do four expert four active experts 128 experts do i arrange the experts this way you know all these different uh model architecture things you're testing at a very small scale right couple researchers few gpus tens of gpus hundreds of gpus whatever it is and then all of a sudden you're like okay guys no more no more fucking around right uh no more screwing around everyone take all the resources we have let's pick what we think will work and just go for it right yolo

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

uh things like hey do i do four expert four active experts 128 experts do i arrange the experts this way you know all these different uh model architecture things you're testing at a very small scale right couple researchers few gpus tens of gpus hundreds of gpus whatever it is and then all of a sudden you're like okay guys no more no more fucking around right uh no more screwing around everyone take all the resources we have let's pick what we think will work and just go for it right yolo

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And this is where that sort of stress comes in as like, well, I know it works here, but some things that work here don't work here. And some things that work here don't work down here, right? In terms of scale, right? So it's really truly a YOLO run.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And this is where that sort of stress comes in as like, well, I know it works here, but some things that work here don't work here. And some things that work here don't work down here, right? In terms of scale, right? So it's really truly a YOLO run.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And this is where that sort of stress comes in as like, well, I know it works here, but some things that work here don't work here. And some things that work here don't work down here, right? In terms of scale, right? So it's really truly a YOLO run.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And sort of like there is this like discussion of like certain researchers just have like this methodical nature, like they can find the whole search space and like figure out all the ablations of different research and really see what is best. And there's certain researchers who just kind of like

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And sort of like there is this like discussion of like certain researchers just have like this methodical nature, like they can find the whole search space and like figure out all the ablations of different research and really see what is best. And there's certain researchers who just kind of like

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And sort of like there is this like discussion of like certain researchers just have like this methodical nature, like they can find the whole search space and like figure out all the ablations of different research and really see what is best. And there's certain researchers who just kind of like

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

The search space is near infinite, right? And yet the amount of compute and time you have is very low. And you have to hit release schedules. You have to not get blown past by everyone. Otherwise, you know, what happened with DeepSeek, you know, crushing Meta and Mistral and Cohere and all these guys, they moved too slow, right? They maybe were too methodical. I don't know.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

The search space is near infinite, right? And yet the amount of compute and time you have is very low. And you have to hit release schedules. You have to not get blown past by everyone. Otherwise, you know, what happened with DeepSeek, you know, crushing Meta and Mistral and Cohere and all these guys, they moved too slow, right? They maybe were too methodical. I don't know.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

The search space is near infinite, right? And yet the amount of compute and time you have is very low. And you have to hit release schedules. You have to not get blown past by everyone. Otherwise, you know, what happened with DeepSeek, you know, crushing Meta and Mistral and Cohere and all these guys, they moved too slow, right? They maybe were too methodical. I don't know.