Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Dylan Patel

๐Ÿ‘ค Speaker
See mentions of this person in podcasts
3551 total appearances

Appearances Over Time

Podcast Appearances

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

But the stuff that trains the DeepSeq v3 and DeepSeq R1, those libraries, if you were to present them to us, I would guess are extremely high-quality code.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

But the stuff that trains the DeepSeq v3 and DeepSeq R1, those libraries, if you were to present them to us, I would guess are extremely high-quality code.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Some of them you do. Some of them are bad data. Can I give an AI2's example of what blew up our earlier models? It's a subreddit called Microwave Gang. We love to shout this out. It's a real thing. You can pull up Microwave Gang. Essentially, it's a subreddit where everybody makes posts that are just the letter M. So it's like, mmm.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Some of them you do. Some of them are bad data. Can I give an AI2's example of what blew up our earlier models? It's a subreddit called Microwave Gang. We love to shout this out. It's a real thing. You can pull up Microwave Gang. Essentially, it's a subreddit where everybody makes posts that are just the letter M. So it's like, mmm.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Some of them you do. Some of them are bad data. Can I give an AI2's example of what blew up our earlier models? It's a subreddit called Microwave Gang. We love to shout this out. It's a real thing. You can pull up Microwave Gang. Essentially, it's a subreddit where everybody makes posts that are just the letter M. So it's like, mmm.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

So there's extremely long sequences of the letter M. And then the comments are like, beep, beep, because it's in the microwave ends. But if you pass this into a model that's trained to be a normal producing text, it's extremely high loss. Because normally you see an M. You don't predict M's for a long time. So this is something that causes a lot of spikes for us.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

So there's extremely long sequences of the letter M. And then the comments are like, beep, beep, because it's in the microwave ends. But if you pass this into a model that's trained to be a normal producing text, it's extremely high loss. Because normally you see an M. You don't predict M's for a long time. So this is something that causes a lot of spikes for us.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

So there's extremely long sequences of the letter M. And then the comments are like, beep, beep, because it's in the microwave ends. But if you pass this into a model that's trained to be a normal producing text, it's extremely high loss. Because normally you see an M. You don't predict M's for a long time. So this is something that causes a lot of spikes for us.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

But when you have much like this is old. This is not recent. And when you have more mature data systems, that's not the thing that causes the loss spike. And what Dylan is saying is true. But it's levels to this sort of idea. With regards to the stress, right?

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

But when you have much like this is old. This is not recent. And when you have more mature data systems, that's not the thing that causes the loss spike. And what Dylan is saying is true. But it's levels to this sort of idea. With regards to the stress, right?

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

But when you have much like this is old. This is not recent. And when you have more mature data systems, that's not the thing that causes the loss spike. And what Dylan is saying is true. But it's levels to this sort of idea. With regards to the stress, right?

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Tokens per second. Lost, not blown up. They're just walking, watching this.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Tokens per second. Lost, not blown up. They're just walking, watching this.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Tokens per second. Lost, not blown up. They're just walking, watching this.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

There are even different types of spikes. So Dirk Greneveld has a theory that I do that's like fast spikes and slow spikes, where there are sometimes where you're looking at the loss and there are other parameters, you can see it start to creep up. and then blow up. And that's really hard to recover from. So you have to go back much further.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

There are even different types of spikes. So Dirk Greneveld has a theory that I do that's like fast spikes and slow spikes, where there are sometimes where you're looking at the loss and there are other parameters, you can see it start to creep up. and then blow up. And that's really hard to recover from. So you have to go back much further.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

There are even different types of spikes. So Dirk Greneveld has a theory that I do that's like fast spikes and slow spikes, where there are sometimes where you're looking at the loss and there are other parameters, you can see it start to creep up. and then blow up. And that's really hard to recover from. So you have to go back much further.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

So you have the stressful period where it's like flat or might start going up. And you're like, what do I do? Whereas there are also lost spikes that are, it looks good. And then there's one spiky data point. And what you can do is you just skip those. You see that there's a spike. You're like, okay, I can ignore this data. Don't update the model and do the next one and it'll recover quickly.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

So you have the stressful period where it's like flat or might start going up. And you're like, what do I do? Whereas there are also lost spikes that are, it looks good. And then there's one spiky data point. And what you can do is you just skip those. You see that there's a spike. You're like, okay, I can ignore this data. Don't update the model and do the next one and it'll recover quickly.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

So you have the stressful period where it's like flat or might start going up. And you're like, what do I do? Whereas there are also lost spikes that are, it looks good. And then there's one spiky data point. And what you can do is you just skip those. You see that there's a spike. You're like, okay, I can ignore this data. Don't update the model and do the next one and it'll recover quickly.