Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Dylan Patel

๐Ÿ‘ค Speaker
See mentions of this person in podcasts
3551 total appearances

Appearances Over Time

Podcast Appearances

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And then when this release was happening, we don't know their exact timeline or soon after they were finishing the training of a different training process from the same next token prediction based model that I talked about, which is when this new reasoning training that people have heard about comes in in order to create the model that is called DeepSeq R1.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And then when this release was happening, we don't know their exact timeline or soon after they were finishing the training of a different training process from the same next token prediction based model that I talked about, which is when this new reasoning training that people have heard about comes in in order to create the model that is called DeepSeq R1.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And then when this release was happening, we don't know their exact timeline or soon after they were finishing the training of a different training process from the same next token prediction based model that I talked about, which is when this new reasoning training that people have heard about comes in in order to create the model that is called DeepSeq R1.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

The R through this conversation is good for grounding for reasoning, and the name is also similar to OpenAI's O1, which is the other reasoning model that people have heard about. And we'll have to break down the training for R1 in more detail because for one, we have a paper detailing it, but also it is a far newer set of techniques for the AI community.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

The R through this conversation is good for grounding for reasoning, and the name is also similar to OpenAI's O1, which is the other reasoning model that people have heard about. And we'll have to break down the training for R1 in more detail because for one, we have a paper detailing it, but also it is a far newer set of techniques for the AI community.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

The R through this conversation is good for grounding for reasoning, and the name is also similar to OpenAI's O1, which is the other reasoning model that people have heard about. And we'll have to break down the training for R1 in more detail because for one, we have a paper detailing it, but also it is a far newer set of techniques for the AI community.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

So it's a much more rapidly evolving area of research.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

So it's a much more rapidly evolving area of research.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

So it's a much more rapidly evolving area of research.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Yeah, so pre-training, I'm using some of the same words to really get the message across is you're doing what is called autoregressive prediction to predict the next token in a series of documents. This is done over standard practices, trillions of tokens. So this is a ton of data that is mostly scraped from the web.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Yeah, so pre-training, I'm using some of the same words to really get the message across is you're doing what is called autoregressive prediction to predict the next token in a series of documents. This is done over standard practices, trillions of tokens. So this is a ton of data that is mostly scraped from the web.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Yeah, so pre-training, I'm using some of the same words to really get the message across is you're doing what is called autoregressive prediction to predict the next token in a series of documents. This is done over standard practices, trillions of tokens. So this is a ton of data that is mostly scraped from the web.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

In some of DeepSeq's earlier papers, they talk about their training data being distilled for math. I shouldn't use this word yet, but taken from Common Crawl. And that's a public access that anyone listening to this could go download data from the Common Crawl website. This is a crawler that is maintained publicly.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

In some of DeepSeq's earlier papers, they talk about their training data being distilled for math. I shouldn't use this word yet, but taken from Common Crawl. And that's a public access that anyone listening to this could go download data from the Common Crawl website. This is a crawler that is maintained publicly.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

In some of DeepSeq's earlier papers, they talk about their training data being distilled for math. I shouldn't use this word yet, but taken from Common Crawl. And that's a public access that anyone listening to this could go download data from the Common Crawl website. This is a crawler that is maintained publicly.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Yes, other tech companies eventually shift to their own crawler, and DeepSeq likely has done this as well as most frontier labs do. But this sort of data is something that people can get started with. And you're just predicting text in a series of documents. This can be scaled to be very efficient.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Yes, other tech companies eventually shift to their own crawler, and DeepSeq likely has done this as well as most frontier labs do. But this sort of data is something that people can get started with. And you're just predicting text in a series of documents. This can be scaled to be very efficient.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Yes, other tech companies eventually shift to their own crawler, and DeepSeq likely has done this as well as most frontier labs do. But this sort of data is something that people can get started with. And you're just predicting text in a series of documents. This can be scaled to be very efficient.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And there's a lot of numbers that are thrown around in AI training, like how many floating point operations or flops are used. And then you can also look at how many hours of these GPUs that are used. And it's largely one loss function taken to a very large amount of compute usage, you just you set up really efficient systems. And then at the end of that, you have the space model.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And there's a lot of numbers that are thrown around in AI training, like how many floating point operations or flops are used. And then you can also look at how many hours of these GPUs that are used. And it's largely one loss function taken to a very large amount of compute usage, you just you set up really efficient systems. And then at the end of that, you have the space model.