Dylan Patel

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And the important thing to say is that no matter how you want the model to behave, these RLHF and preference tuning techniques also improve performance. So on things like math evals and code evals, there is something innate to these what is called contrastive loss functions. We could start to get into RL here.

10181.683 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

We don't really need to, but RLHF also boosts performance on anything from a chat task to a math problem to a code problem. So it is becoming a much more useful tool to these labs. So this kind of takes us through the arc of we've talked about pre-training, hard to get rid of things. We've talked about post-training and how post-training, if you You can mess it up.

10199.735 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

We don't really need to, but RLHF also boosts performance on anything from a chat task to a math problem to a code problem. So it is becoming a much more useful tool to these labs. So this kind of takes us through the arc of we've talked about pre-training, hard to get rid of things. We've talked about post-training and how post-training, if you You can mess it up.

10199.735 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

We don't really need to, but RLHF also boosts performance on anything from a chat task to a math problem to a code problem. So it is becoming a much more useful tool to these labs. So this kind of takes us through the arc of we've talked about pre-training, hard to get rid of things. We've talked about post-training and how post-training, if you You can mess it up.

10199.735 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

It's a complex, multifaceted optimization with 10 to 100 person teams converging on one artifact. It's really easy to not do it perfectly. And then there's the third case, which is what we talked about Gemini. The thing that was about Gemini is this was a served product where Google has their internal model weights. They've done all these processes that we talked about.

10218.283 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

It's a complex, multifaceted optimization with 10 to 100 person teams converging on one artifact. It's really easy to not do it perfectly. And then there's the third case, which is what we talked about Gemini. The thing that was about Gemini is this was a served product where Google has their internal model weights. They've done all these processes that we talked about.

10218.283 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

It's a complex, multifaceted optimization with 10 to 100 person teams converging on one artifact. It's really easy to not do it perfectly. And then there's the third case, which is what we talked about Gemini. The thing that was about Gemini is this was a served product where Google has their internal model weights. They've done all these processes that we talked about.

10218.283 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And in the served product, what came out after this was that they had a prompt that they were rewriting user queries to boost diversity or something. And this just made it, the outputs were just blatantly wrong. It was some sort of organizational failure that had this prompt in that position. And I think Google executives probably have owned this. I didn't pay that attention, that detail.

10237.378 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And in the served product, what came out after this was that they had a prompt that they were rewriting user queries to boost diversity or something. And this just made it, the outputs were just blatantly wrong. It was some sort of organizational failure that had this prompt in that position. And I think Google executives probably have owned this. I didn't pay that attention, that detail.

10237.378 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And in the served product, what came out after this was that they had a prompt that they were rewriting user queries to boost diversity or something. And this just made it, the outputs were just blatantly wrong. It was some sort of organizational failure that had this prompt in that position. And I think Google executives probably have owned this. I didn't pay that attention, that detail.

10237.378 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

But it was just a mess up in execution that led to this ridiculous thing. But at the system level, the model weights might have been fine.

10257.374 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

But it was just a mess up in execution that led to this ridiculous thing. But at the system level, the model weights might have been fine.

10257.374 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

But it was just a mess up in execution that led to this ridiculous thing. But at the system level, the model weights might have been fine.

10257.374 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

It was like the system prompt or what is called in industry is like you rewrite prompts. So especially for image models, if you're using Dolly or ChatGPT can generate you an image, you'll say, draw me a beautiful car. Mm-hmm. With these leading image models, they benefit from highly descriptive prompts.

10268.782 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

It was like the system prompt or what is called in industry is like you rewrite prompts. So especially for image models, if you're using Dolly or ChatGPT can generate you an image, you'll say, draw me a beautiful car. Mm-hmm. With these leading image models, they benefit from highly descriptive prompts.

10268.782 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

It was like the system prompt or what is called in industry is like you rewrite prompts. So especially for image models, if you're using Dolly or ChatGPT can generate you an image, you'll say, draw me a beautiful car. Mm-hmm. With these leading image models, they benefit from highly descriptive prompts.

10268.782 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

So what would happen is if you do that on ChatGPT, a language model behind the scenes will rewrite the prompt, say, make this more descriptive, and then that is passed to the image model. So prompt rewriting is something that is used at multiple levels of industry, and it's used effectively for image models, and the Gemini example is just a failed execution.

10288.154 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

So what would happen is if you do that on ChatGPT, a language model behind the scenes will rewrite the prompt, say, make this more descriptive, and then that is passed to the image model. So prompt rewriting is something that is used at multiple levels of industry, and it's used effectively for image models, and the Gemini example is just a failed execution.

10288.154 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

So what would happen is if you do that on ChatGPT, a language model behind the scenes will rewrite the prompt, say, make this more descriptive, and then that is passed to the image model. So prompt rewriting is something that is used at multiple levels of industry, and it's used effectively for image models, and the Gemini example is just a failed execution.

10288.154 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

For the past few years, the highest cost human data has been in these preferences, which is comparing, I would say highest cost and highest total usage. So a lot of money has gone to these pairwise comparisons where you have two model outputs and a human is comparing between the two of them. In earlier years, there was a lot of this instruction tuning data.

10320.868 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment