Dylan Patel

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Yeah, on our eval benchmark. DeepSeq v3 is really similar. We have a safety benchmark to understand if it will say harmful things and things like that. And that's what draws down most of the way.

17480.836 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Yeah, on our eval benchmark. DeepSeq v3 is really similar. We have a safety benchmark to understand if it will say harmful things and things like that. And that's what draws down most of the way.

17480.836 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Yeah, on our eval benchmark. DeepSeq v3 is really similar. We have a safety benchmark to understand if it will say harmful things and things like that. And that's what draws down most of the way.

17480.836 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Yeah, so we have a 10 evaluation. This is standard practice in post-training is you choose your evaluations you care about. In academics, in smaller labs, you'll have fewer evaluations. In companies, you'll have a really one domain that you really care about. In frontier labs, you'll have 10s to 20s to maybe even like 100 evaluations of specific things.

17494.543 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Yeah, so we have a 10 evaluation. This is standard practice in post-training is you choose your evaluations you care about. In academics, in smaller labs, you'll have fewer evaluations. In companies, you'll have a really one domain that you really care about. In frontier labs, you'll have 10s to 20s to maybe even like 100 evaluations of specific things.

17494.543 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Yeah, so we have a 10 evaluation. This is standard practice in post-training is you choose your evaluations you care about. In academics, in smaller labs, you'll have fewer evaluations. In companies, you'll have a really one domain that you really care about. In frontier labs, you'll have 10s to 20s to maybe even like 100 evaluations of specific things.

17494.543 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

So we'd choose a representative suite of things that look like chat, precise instruction following, which is like respond only in emojis. Like does the model follow weird things like that? Math, code. And you create a suite like this. So safety would be one of 10. In that type of suite where you have like, what is the broader community of AI care about?

17511.176 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

So we'd choose a representative suite of things that look like chat, precise instruction following, which is like respond only in emojis. Like does the model follow weird things like that? Math, code. And you create a suite like this. So safety would be one of 10. In that type of suite where you have like, what is the broader community of AI care about?

17511.176 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

So we'd choose a representative suite of things that look like chat, precise instruction following, which is like respond only in emojis. Like does the model follow weird things like that? Math, code. And you create a suite like this. So safety would be one of 10. In that type of suite where you have like, what is the broader community of AI care about?

17511.176 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And for example, in comparison to deep seek, it would be something like our average eval for our model would be 80, including safety and similar without and deep seek would be like 79% average score average. without safety and their safety score would bring it down to like 76 on average.

17529.192 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And for example, in comparison to deep seek, it would be something like our average eval for our model would be 80, including safety and similar without and deep seek would be like 79% average score average. without safety and their safety score would bring it down to like 76 on average.

17529.192 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And for example, in comparison to deep seek, it would be something like our average eval for our model would be 80, including safety and similar without and deep seek would be like 79% average score average. without safety and their safety score would bring it down to like 76 on average.

17529.192 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Yeah, so this is something that internally it's like, I don't want to win only by like how you shape the valve benchmark. So if there's something that's like people may or may not care about safety in their model, safety can come downstream. Safety can be when you host the model for an API. Like safety is... addressed in a spectrum of locations and applications.

17548.208 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Yeah, so this is something that internally it's like, I don't want to win only by like how you shape the valve benchmark. So if there's something that's like people may or may not care about safety in their model, safety can come downstream. Safety can be when you host the model for an API. Like safety is... addressed in a spectrum of locations and applications.

17548.208 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Yeah, so this is something that internally it's like, I don't want to win only by like how you shape the valve benchmark. So if there's something that's like people may or may not care about safety in their model, safety can come downstream. Safety can be when you host the model for an API. Like safety is... addressed in a spectrum of locations and applications.

17548.208 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

So it's like, if you want to say that you have the best recipe, you can't just gate it on these things that some people might not want. And this is because it's like the time of progress. We benefit if we can release a model later. We have more time to learn new techniques like this RL technique. We had started this in the fall. It's now really popular as reasoning models.

17565.047 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

So it's like, if you want to say that you have the best recipe, you can't just gate it on these things that some people might not want. And this is because it's like the time of progress. We benefit if we can release a model later. We have more time to learn new techniques like this RL technique. We had started this in the fall. It's now really popular as reasoning models.

17565.047 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

So it's like, if you want to say that you have the best recipe, you can't just gate it on these things that some people might not want. And this is because it's like the time of progress. We benefit if we can release a model later. We have more time to learn new techniques like this RL technique. We had started this in the fall. It's now really popular as reasoning models.

17565.047 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

The next thing to do for open source post-training is to scale up verifiers, to scale up data, to replicate some of DeepSeq's results. And it's awesome that we have a paper to draw on and it makes it a lot easier. And that's the type of things that is going on among academic and closed frontier research in AI.

17584.297 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

The next thing to do for open source post-training is to scale up verifiers, to scale up data, to replicate some of DeepSeq's results. And it's awesome that we have a paper to draw on and it makes it a lot easier. And that's the type of things that is going on among academic and closed frontier research in AI.

17584.297 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment