Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Dylan Patel

๐Ÿ‘ค Speaker
See mentions of this person in podcasts
3551 total appearances

Appearances Over Time

Podcast Appearances

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And it adds up.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

I think we should summarize what the bitter lesson actually is about. The bitter lesson, essentially, if you paraphrase it, is that the types of training that will win out in deep learning as we go are those methods that are which are scalable in learning and search is what it calls out. And This scale word gets a lot of attention in this.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

I think we should summarize what the bitter lesson actually is about. The bitter lesson, essentially, if you paraphrase it, is that the types of training that will win out in deep learning as we go are those methods that are which are scalable in learning and search is what it calls out. And This scale word gets a lot of attention in this.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

I think we should summarize what the bitter lesson actually is about. The bitter lesson, essentially, if you paraphrase it, is that the types of training that will win out in deep learning as we go are those methods that are which are scalable in learning and search is what it calls out. And This scale word gets a lot of attention in this.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

The interpretation that I use is effectively to avoid adding human priors to your learning process. And if you read the original essay, this is what it talks about, is how

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

The interpretation that I use is effectively to avoid adding human priors to your learning process. And if you read the original essay, this is what it talks about, is how

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

The interpretation that I use is effectively to avoid adding human priors to your learning process. And if you read the original essay, this is what it talks about, is how

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Researchers will try to come up with clever solutions to their specific problem that might get them small gains in the short term, while simply enabling these deep learning systems to work efficiently and for these bigger problems in the long term might be more likely to scale and continue to drive success.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Researchers will try to come up with clever solutions to their specific problem that might get them small gains in the short term, while simply enabling these deep learning systems to work efficiently and for these bigger problems in the long term might be more likely to scale and continue to drive success.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Researchers will try to come up with clever solutions to their specific problem that might get them small gains in the short term, while simply enabling these deep learning systems to work efficiently and for these bigger problems in the long term might be more likely to scale and continue to drive success.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And therefore, we were talking about relatively small implementation changes to the mixture of experts model. And therefore, it's like, okay, like, we will need a few more years to know if one of these are actually really crucial to the bitter lesson. But the bitter lesson is really this long term arc of how. Simplicity can often win.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And therefore, we were talking about relatively small implementation changes to the mixture of experts model. And therefore, it's like, okay, like, we will need a few more years to know if one of these are actually really crucial to the bitter lesson. But the bitter lesson is really this long term arc of how. Simplicity can often win.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And therefore, we were talking about relatively small implementation changes to the mixture of experts model. And therefore, it's like, okay, like, we will need a few more years to know if one of these are actually really crucial to the bitter lesson. But the bitter lesson is really this long term arc of how. Simplicity can often win.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

There's a lot of sayings in the industry like the models just want to learn. You have to give them the simple loss landscape where you put compute through the model and they will learn and getting barriers out of the way.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

There's a lot of sayings in the industry like the models just want to learn. You have to give them the simple loss landscape where you put compute through the model and they will learn and getting barriers out of the way.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

There's a lot of sayings in the industry like the models just want to learn. You have to give them the simple loss landscape where you put compute through the model and they will learn and getting barriers out of the way.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

I'm sure they have, DeepSeek definitely has code bases that are extremely messy where they're testing these new ideas. multi-head latent attention. Probably could start in something like a Jupyter notebook, where somebody tries something on a few GPUs, and that is really messy.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

I'm sure they have, DeepSeek definitely has code bases that are extremely messy where they're testing these new ideas. multi-head latent attention. Probably could start in something like a Jupyter notebook, where somebody tries something on a few GPUs, and that is really messy.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

I'm sure they have, DeepSeek definitely has code bases that are extremely messy where they're testing these new ideas. multi-head latent attention. Probably could start in something like a Jupyter notebook, where somebody tries something on a few GPUs, and that is really messy.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

But the stuff that trains the DeepSeq v3 and DeepSeq R1, those libraries, if you were to present them to us, I would guess are extremely high-quality code.