Dylan Patel

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

But these like, on trickier implementations. So as you get more complex in your architecture and you scale up to more GPUs, you have more potential for your loss blowing up.

3341.653 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

But these like, on trickier implementations. So as you get more complex in your architecture and you scale up to more GPUs, you have more potential for your loss blowing up.

3341.653 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

But these like, on trickier implementations. So as you get more complex in your architecture and you scale up to more GPUs, you have more potential for your loss blowing up.

3341.653 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Every company has failed runs. You need failed runs to push the envelope on your infrastructure. So a lot of news cycles are made of X company had Y failed run. Every company that's trying to push the frontier of AI has these. So yes, it's noteworthy because it's a lot of money and it can be week to month setback, but it is part of the process.

3378.746 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Every company has failed runs. You need failed runs to push the envelope on your infrastructure. So a lot of news cycles are made of X company had Y failed run. Every company that's trying to push the frontier of AI has these. So yes, it's noteworthy because it's a lot of money and it can be week to month setback, but it is part of the process.

3378.746 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Every company has failed runs. You need failed runs to push the envelope on your infrastructure. So a lot of news cycles are made of X company had Y failed run. Every company that's trying to push the frontier of AI has these. So yes, it's noteworthy because it's a lot of money and it can be week to month setback, but it is part of the process.

3378.746 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Key hyperparameters like learning rate and regularization and things like this. And you find the regime that works for your code base. I've... Talking to people at Frontier Labs, there's a story that you can tell where training language models is kind of a path that you need to follow. So you need to unlock the ability to train a certain type of model or a certain scale.

3423.381 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Key hyperparameters like learning rate and regularization and things like this. And you find the regime that works for your code base. I've... Talking to people at Frontier Labs, there's a story that you can tell where training language models is kind of a path that you need to follow. So you need to unlock the ability to train a certain type of model or a certain scale.

3423.381 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Key hyperparameters like learning rate and regularization and things like this. And you find the regime that works for your code base. I've... Talking to people at Frontier Labs, there's a story that you can tell where training language models is kind of a path that you need to follow. So you need to unlock the ability to train a certain type of model or a certain scale.

3423.381 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And then your code base and your internal know-how of which hyperparameters work for it is kind of known. And you look at the DeepSeq papers and models, they've scaled up, they've added complexity, and it's just continuing to build the capabilities that they have.

3443.976 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And then your code base and your internal know-how of which hyperparameters work for it is kind of known. And you look at the DeepSeq papers and models, they've scaled up, they've added complexity, and it's just continuing to build the capabilities that they have.

3443.976 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And then your code base and your internal know-how of which hyperparameters work for it is kind of known. And you look at the DeepSeq papers and models, they've scaled up, they've added complexity, and it's just continuing to build the capabilities that they have.

3443.976 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

you know have that innate gut instinct of like this is the yolo run like you know looking at the data this is it this is why you want to work in post-training because the gpu cost for training is lower so you can make a higher percentage of your training runs yolo runs yeah for now yeah for now for now so some of this is fundamentally luck still luck is skill right in many cases

3528.109 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

you know have that innate gut instinct of like this is the yolo run like you know looking at the data this is it this is why you want to work in post-training because the gpu cost for training is lower so you can make a higher percentage of your training runs yolo runs yeah for now yeah for now for now so some of this is fundamentally luck still luck is skill right in many cases

3528.109 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

you know have that innate gut instinct of like this is the yolo run like you know looking at the data this is it this is why you want to work in post-training because the gpu cost for training is lower so you can make a higher percentage of your training runs yolo runs yeah for now yeah for now for now so some of this is fundamentally luck still luck is skill right in many cases

3528.109 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Yeah, I mean, it looks lucky, right, when you're... But the hill to climb, if you're on one of these labs, you have an evaluation, you're not crushing. There's a repeated playbook of how you improve things. There are localized improvements, which might be data improvements, and these add up into the whole model just being much better.

3551.124 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Yeah, I mean, it looks lucky, right, when you're... But the hill to climb, if you're on one of these labs, you have an evaluation, you're not crushing. There's a repeated playbook of how you improve things. There are localized improvements, which might be data improvements, and these add up into the whole model just being much better.

3551.124 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Yeah, I mean, it looks lucky, right, when you're... But the hill to climb, if you're on one of these labs, you have an evaluation, you're not crushing. There's a repeated playbook of how you improve things. There are localized improvements, which might be data improvements, and these add up into the whole model just being much better.

3551.124 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And when you zoom in really close, it can be really obvious that this model is just really bad at this thing, and we can fix it, and you just add these up. So some of it feels like luck, but on the ground, especially with these new reasoning models we're talking to, it's just... so many ways that we can poke around. And normally it's that some of them give big improvements.

3565.896 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And when you zoom in really close, it can be really obvious that this model is just really bad at this thing, and we can fix it, and you just add these up. So some of it feels like luck, but on the ground, especially with these new reasoning models we're talking to, it's just... so many ways that we can poke around. And normally it's that some of them give big improvements.

3565.896 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment