Dylan Patel

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Kind of like, yes, adding the human... Designing the perfect Google button. Google's famous for having people design buttons that are so perfect. And it's like, how is AI going to do that? Like, they could give you all the ideas, but...

17096.643 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And humans are actually very good at reading or judging between two things. This goes back to the core of what RLHF and preference tuning is, is that it's hard to generate a good answer for a lot of problems, but it's easy to see which one is better. And that's how we're using humans for AI now, is judging which one is better. And that's what software engineering could look like.

17127.259 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And humans are actually very good at reading or judging between two things. This goes back to the core of what RLHF and preference tuning is, is that it's hard to generate a good answer for a lot of problems, but it's easy to see which one is better. And that's how we're using humans for AI now, is judging which one is better. And that's what software engineering could look like.

17127.259 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And humans are actually very good at reading or judging between two things. This goes back to the core of what RLHF and preference tuning is, is that it's hard to generate a good answer for a lot of problems, but it's easy to see which one is better. And that's how we're using humans for AI now, is judging which one is better. And that's what software engineering could look like.

17127.259 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

The PR review, here's a few options. Here are some potential pros and cons. And they're going to be judges.

17145.687 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

The PR review, here's a few options. Here are some potential pros and cons. And they're going to be judges.

17145.687 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

The PR review, here's a few options. Here are some potential pros and cons. And they're going to be judges.

17145.687 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

I'll explain what a Tulu is. A Tulu is a hybrid camel when you breed a dromedary with a Bacchian camel. Back in the early days after Chachipiti, there was a big wave of models coming out like Alpaca, Vicuna, et cetera, that were all named after various mammalian species. So Tulu, the brand is multiple years old, which comes from that. And

17274.59 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

I'll explain what a Tulu is. A Tulu is a hybrid camel when you breed a dromedary with a Bacchian camel. Back in the early days after Chachipiti, there was a big wave of models coming out like Alpaca, Vicuna, et cetera, that were all named after various mammalian species. So Tulu, the brand is multiple years old, which comes from that. And

17274.59 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

I'll explain what a Tulu is. A Tulu is a hybrid camel when you breed a dromedary with a Bacchian camel. Back in the early days after Chachipiti, there was a big wave of models coming out like Alpaca, Vicuna, et cetera, that were all named after various mammalian species. So Tulu, the brand is multiple years old, which comes from that. And

17274.59 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

We've been playing at the frontiers of post-training with open source code. And this first part of this release was in the fall where we built on Lama's open models, open weight models, and then we add in our fully open code, our fully open data. There's a popular benchmark that is Chatbot Arena, and that's generally the metric by which how these chat models are evaluated.

17295.649 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

We've been playing at the frontiers of post-training with open source code. And this first part of this release was in the fall where we built on Lama's open models, open weight models, and then we add in our fully open code, our fully open data. There's a popular benchmark that is Chatbot Arena, and that's generally the metric by which how these chat models are evaluated.

17295.649 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

We've been playing at the frontiers of post-training with open source code. And this first part of this release was in the fall where we built on Lama's open models, open weight models, and then we add in our fully open code, our fully open data. There's a popular benchmark that is Chatbot Arena, and that's generally the metric by which how these chat models are evaluated.

17295.649 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And it's humans compare random models from different organizations. And if you looked at the leaderboard in November or December, among the top 60 models from 10s to 20s of organizations, none of them had open code or data for just post-training. Among that, even fewer or none have pre-training data and code available, but post-training is much more accessible at this time.

17319.285 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And it's humans compare random models from different organizations. And if you looked at the leaderboard in November or December, among the top 60 models from 10s to 20s of organizations, none of them had open code or data for just post-training. Among that, even fewer or none have pre-training data and code available, but post-training is much more accessible at this time.

17319.285 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And it's humans compare random models from different organizations. And if you looked at the leaderboard in November or December, among the top 60 models from 10s to 20s of organizations, none of them had open code or data for just post-training. Among that, even fewer or none have pre-training data and code available, but post-training is much more accessible at this time.

17319.285 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

It's still pretty cheap and you can do it. And the thing is, how high can we push this number where people have access to all the code and data? So that's kind of the motivation of the project. We draw on lessons from Lama. NVIDIA had a Nemotron model where the recipe for their post-training was fairly open with some data and a paper.

17338.736 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

It's still pretty cheap and you can do it. And the thing is, how high can we push this number where people have access to all the code and data? So that's kind of the motivation of the project. We draw on lessons from Lama. NVIDIA had a Nemotron model where the recipe for their post-training was fairly open with some data and a paper.

17338.736 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

It's still pretty cheap and you can do it. And the thing is, how high can we push this number where people have access to all the code and data? So that's kind of the motivation of the project. We draw on lessons from Lama. NVIDIA had a Nemotron model where the recipe for their post-training was fairly open with some data and a paper.

17338.736 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And it's putting all these together to try to create a recipe that people can fine-tune models like GPT-4 to their domain.

17356.207 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment