Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Dylan Patel

๐Ÿ‘ค Speaker
See mentions of this person in podcasts
3551 total appearances

Appearances Over Time

Podcast Appearances

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

It has a different flavor to it. Its behavior is less expressive than something like O1. It has fewer tracks than it is on. Quinn released a model last fall, QWQ, which was their preview reasoning model. And in DeepSeek had R1 Lite last fall, where these models kind of felt like they're on rails, where they really, really only can do math and code. And O1 is, it can answer anything.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

It has a different flavor to it. Its behavior is less expressive than something like O1. It has fewer tracks than it is on. Quinn released a model last fall, QWQ, which was their preview reasoning model. And in DeepSeek had R1 Lite last fall, where these models kind of felt like they're on rails, where they really, really only can do math and code. And O1 is, it can answer anything.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

It has a different flavor to it. Its behavior is less expressive than something like O1. It has fewer tracks than it is on. Quinn released a model last fall, QWQ, which was their preview reasoning model. And in DeepSeek had R1 Lite last fall, where these models kind of felt like they're on rails, where they really, really only can do math and code. And O1 is, it can answer anything.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

It might not be perfect for some tasks, but it's flexible. It has some richness to it. And this is kind of the art of Is a model a little bit undercooked? It's good to get a model out the door, but it's hard to gauge and it takes a lot of taste to be like, is this a full-fledged model? Can I use this for everything? They're probably more similar for math and code.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

It might not be perfect for some tasks, but it's flexible. It has some richness to it. And this is kind of the art of Is a model a little bit undercooked? It's good to get a model out the door, but it's hard to gauge and it takes a lot of taste to be like, is this a full-fledged model? Can I use this for everything? They're probably more similar for math and code.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

It might not be perfect for some tasks, but it's flexible. It has some richness to it. And this is kind of the art of Is a model a little bit undercooked? It's good to get a model out the door, but it's hard to gauge and it takes a lot of taste to be like, is this a full-fledged model? Can I use this for everything? They're probably more similar for math and code.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

My quick read is that Gemini Flash is not... trained the same way as 01, but taking an existing training stack, adding reasoning to it. So taking a more normal training stack and adding reasoning to it. And I'm sure they're going to have more. I mean, they've done quick releases on Gemini Flash, the reasoning, and this is the second version from the holidays. It's evolving fast and

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

My quick read is that Gemini Flash is not... trained the same way as 01, but taking an existing training stack, adding reasoning to it. So taking a more normal training stack and adding reasoning to it. And I'm sure they're going to have more. I mean, they've done quick releases on Gemini Flash, the reasoning, and this is the second version from the holidays. It's evolving fast and

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

My quick read is that Gemini Flash is not... trained the same way as 01, but taking an existing training stack, adding reasoning to it. So taking a more normal training stack and adding reasoning to it. And I'm sure they're going to have more. I mean, they've done quick releases on Gemini Flash, the reasoning, and this is the second version from the holidays. It's evolving fast and

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

it takes longer to make this training stack where you're doing this large scale.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

it takes longer to make this training stack where you're doing this large scale.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

it takes longer to make this training stack where you're doing this large scale.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Yeah.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Yeah.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Yeah.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

The way I can ramble, why I can ramble about this so much is that we've been working on this at AI2 before O1 was fully available to everyone and before R1, which is essentially using this RL training for fine tuning.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

The way I can ramble, why I can ramble about this so much is that we've been working on this at AI2 before O1 was fully available to everyone and before R1, which is essentially using this RL training for fine tuning.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

The way I can ramble, why I can ramble about this so much is that we've been working on this at AI2 before O1 was fully available to everyone and before R1, which is essentially using this RL training for fine tuning.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

We use this in our like Tulu series of models and you can elicit the same behaviors where you say like weight and so much on, but it's so late in the training process that this kind of reasoning expression is much lighter. Yeah. So there's essentially a gradation, and just how much of this RL training you put into it determines how the output looks.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

We use this in our like Tulu series of models and you can elicit the same behaviors where you say like weight and so much on, but it's so late in the training process that this kind of reasoning expression is much lighter. Yeah. So there's essentially a gradation, and just how much of this RL training you put into it determines how the output looks.