Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Dylan Patel

๐Ÿ‘ค Speaker
See mentions of this person in podcasts
3551 total appearances

Appearances Over Time

Podcast Appearances

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Yeah, on our eval benchmark. DeepSeq v3 is really similar. We have a safety benchmark to understand if it will say harmful things and things like that. And that's what draws down most of the way.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Yeah, on our eval benchmark. DeepSeq v3 is really similar. We have a safety benchmark to understand if it will say harmful things and things like that. And that's what draws down most of the way.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Yeah, on our eval benchmark. DeepSeq v3 is really similar. We have a safety benchmark to understand if it will say harmful things and things like that. And that's what draws down most of the way.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Yeah, so we have a 10 evaluation. This is standard practice in post-training is you choose your evaluations you care about. In academics, in smaller labs, you'll have fewer evaluations. In companies, you'll have a really one domain that you really care about. In frontier labs, you'll have 10s to 20s to maybe even like 100 evaluations of specific things.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Yeah, so we have a 10 evaluation. This is standard practice in post-training is you choose your evaluations you care about. In academics, in smaller labs, you'll have fewer evaluations. In companies, you'll have a really one domain that you really care about. In frontier labs, you'll have 10s to 20s to maybe even like 100 evaluations of specific things.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Yeah, so we have a 10 evaluation. This is standard practice in post-training is you choose your evaluations you care about. In academics, in smaller labs, you'll have fewer evaluations. In companies, you'll have a really one domain that you really care about. In frontier labs, you'll have 10s to 20s to maybe even like 100 evaluations of specific things.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

So we'd choose a representative suite of things that look like chat, precise instruction following, which is like respond only in emojis. Like does the model follow weird things like that? Math, code. And you create a suite like this. So safety would be one of 10. In that type of suite where you have like, what is the broader community of AI care about?

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

So we'd choose a representative suite of things that look like chat, precise instruction following, which is like respond only in emojis. Like does the model follow weird things like that? Math, code. And you create a suite like this. So safety would be one of 10. In that type of suite where you have like, what is the broader community of AI care about?

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

So we'd choose a representative suite of things that look like chat, precise instruction following, which is like respond only in emojis. Like does the model follow weird things like that? Math, code. And you create a suite like this. So safety would be one of 10. In that type of suite where you have like, what is the broader community of AI care about?

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And for example, in comparison to deep seek, it would be something like our average eval for our model would be 80, including safety and similar without and deep seek would be like 79% average score average. without safety and their safety score would bring it down to like 76 on average.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And for example, in comparison to deep seek, it would be something like our average eval for our model would be 80, including safety and similar without and deep seek would be like 79% average score average. without safety and their safety score would bring it down to like 76 on average.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And for example, in comparison to deep seek, it would be something like our average eval for our model would be 80, including safety and similar without and deep seek would be like 79% average score average. without safety and their safety score would bring it down to like 76 on average.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Yeah, so this is something that internally it's like, I don't want to win only by like how you shape the valve benchmark. So if there's something that's like people may or may not care about safety in their model, safety can come downstream. Safety can be when you host the model for an API. Like safety is... addressed in a spectrum of locations and applications.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Yeah, so this is something that internally it's like, I don't want to win only by like how you shape the valve benchmark. So if there's something that's like people may or may not care about safety in their model, safety can come downstream. Safety can be when you host the model for an API. Like safety is... addressed in a spectrum of locations and applications.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Yeah, so this is something that internally it's like, I don't want to win only by like how you shape the valve benchmark. So if there's something that's like people may or may not care about safety in their model, safety can come downstream. Safety can be when you host the model for an API. Like safety is... addressed in a spectrum of locations and applications.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

So it's like, if you want to say that you have the best recipe, you can't just gate it on these things that some people might not want. And this is because it's like the time of progress. We benefit if we can release a model later. We have more time to learn new techniques like this RL technique. We had started this in the fall. It's now really popular as reasoning models.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

So it's like, if you want to say that you have the best recipe, you can't just gate it on these things that some people might not want. And this is because it's like the time of progress. We benefit if we can release a model later. We have more time to learn new techniques like this RL technique. We had started this in the fall. It's now really popular as reasoning models.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

So it's like, if you want to say that you have the best recipe, you can't just gate it on these things that some people might not want. And this is because it's like the time of progress. We benefit if we can release a model later. We have more time to learn new techniques like this RL technique. We had started this in the fall. It's now really popular as reasoning models.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

The next thing to do for open source post-training is to scale up verifiers, to scale up data, to replicate some of DeepSeq's results. And it's awesome that we have a paper to draw on and it makes it a lot easier. And that's the type of things that is going on among academic and closed frontier research in AI.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

The next thing to do for open source post-training is to scale up verifiers, to scale up data, to replicate some of DeepSeq's results. And it's awesome that we have a paper to draw on and it makes it a lot easier. And that's the type of things that is going on among academic and closed frontier research in AI.