Dylan Patel

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And it's putting all these together to try to create a recipe that people can fine-tune models like GPT-4 to their domain.

17356.207 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And it's putting all these together to try to create a recipe that people can fine-tune models like GPT-4 to their domain.

17356.207 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Tulu has been a series of recipes for post-training. So we've done multiple models over years.

17373.277 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Tulu has been a series of recipes for post-training. So we've done multiple models over years.

17373.277 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Tulu has been a series of recipes for post-training. So we've done multiple models over years.

17373.277 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Yeah, if you start with an open weight-based model, the whole model technically is an open source because you don't know what Lama put into it, which is why we have a separate thing that we'll get to. But it's just getting parts of the pipeline where people can zoom in and customize.

17381.132 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Yeah, if you start with an open weight-based model, the whole model technically is an open source because you don't know what Lama put into it, which is why we have a separate thing that we'll get to. But it's just getting parts of the pipeline where people can zoom in and customize.

17381.132 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Yeah, if you start with an open weight-based model, the whole model technically is an open source because you don't know what Lama put into it, which is why we have a separate thing that we'll get to. But it's just getting parts of the pipeline where people can zoom in and customize.

17381.132 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

I know I hear from startups and businesses, they're like, okay, I can take this post-training and try to apply it to my domain. We talk about verifiers a lot. We use this idea, which is reinforcement learning with verifiable domain rewards, RLVR, kind of similar to RLHF. And we applied it to math and the model today, which is like we applied it to the Lama 405B base model from last year.

17394.615 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

I know I hear from startups and businesses, they're like, okay, I can take this post-training and try to apply it to my domain. We talk about verifiers a lot. We use this idea, which is reinforcement learning with verifiable domain rewards, RLVR, kind of similar to RLHF. And we applied it to math and the model today, which is like we applied it to the Lama 405B base model from last year.

17394.615 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

I know I hear from startups and businesses, they're like, okay, I can take this post-training and try to apply it to my domain. We talk about verifiers a lot. We use this idea, which is reinforcement learning with verifiable domain rewards, RLVR, kind of similar to RLHF. And we applied it to math and the model today, which is like we applied it to the Lama 405B base model from last year.

17394.615 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And we have our other stuff. We have our instruction tuning and our preference tuning. But the math thing is interesting, which is like it's easier to improve this math benchmark. There's a benchmark, M-A-T-H, math, all capitals. tough name on the benchmark is name is the area that you're evaluating. We're researchers. We're not, we're not brands, brand strategists.

17418.748 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And we have our other stuff. We have our instruction tuning and our preference tuning. But the math thing is interesting, which is like it's easier to improve this math benchmark. There's a benchmark, M-A-T-H, math, all capitals. tough name on the benchmark is name is the area that you're evaluating. We're researchers. We're not, we're not brands, brand strategists.

17418.748 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And we have our other stuff. We have our instruction tuning and our preference tuning. But the math thing is interesting, which is like it's easier to improve this math benchmark. There's a benchmark, M-A-T-H, math, all capitals. tough name on the benchmark is name is the area that you're evaluating. We're researchers. We're not, we're not brands, brand strategists.

17418.748 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And this is something that the deep seek paper talked about as well as like at this bigger model, it's easier to elicit powerful capabilities with this RL training. And then they distill it down from that big model to the small model. And this model we released today, we saw the same thing as it were AI too. We don't have a ton of compute. We can't train four or five B models all the time.

17439.005 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And this is something that the deep seek paper talked about as well as like at this bigger model, it's easier to elicit powerful capabilities with this RL training. And then they distill it down from that big model to the small model. And this model we released today, we saw the same thing as it were AI too. We don't have a ton of compute. We can't train four or five B models all the time.

17439.005 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And this is something that the deep seek paper talked about as well as like at this bigger model, it's easier to elicit powerful capabilities with this RL training. And then they distill it down from that big model to the small model. And this model we released today, we saw the same thing as it were AI too. We don't have a ton of compute. We can't train four or five B models all the time.

17439.005 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

So we just did a few runs and they tend to work. And it's like, it just shows that there's a lot of room for people to play in these things. And they crushed Lama's actual release, right? Like they're way better than it. Yeah. So our Val numbers, I mean, we have extra months in this, but our Val numbers are like much better than the Lama instruct model that they released.

17458.598 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

So we just did a few runs and they tend to work. And it's like, it just shows that there's a lot of room for people to play in these things. And they crushed Lama's actual release, right? Like they're way better than it. Yeah. So our Val numbers, I mean, we have extra months in this, but our Val numbers are like much better than the Lama instruct model that they released.

17458.598 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

So we just did a few runs and they tend to work. And it's like, it just shows that there's a lot of room for people to play in these things. And they crushed Lama's actual release, right? Like they're way better than it. Yeah. So our Val numbers, I mean, we have extra months in this, but our Val numbers are like much better than the Lama instruct model that they released.

17458.598 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment