Menu
Sign In Search Podcasts Libraries Charts People & Topics Add Podcast API Blog Pricing

Nathan Lambert

๐Ÿ‘ค Speaker
1668 total appearances

Appearances Over Time

Podcast Appearances

Today on the AI Daily Brief, the skills we need to develop for the Code HEI era.

The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI.

First of all, today's episode is brought to you by Zencoder, robots and pencils, and Super Intelligent.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Yeah. So there's two main techniques that they implemented that are probably the majority of their efficiency. And then there's a lot of implementation details that maybe we'll gloss over or get into later that sort of contribute to it. But those two main things are, one is they went to a mixture of experts model, which we'll define in a second.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Yeah. So there's two main techniques that they implemented that are probably the majority of their efficiency. And then there's a lot of implementation details that maybe we'll gloss over or get into later that sort of contribute to it. But those two main things are, one is they went to a mixture of experts model, which we'll define in a second.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Yeah. So there's two main techniques that they implemented that are probably the majority of their efficiency. And then there's a lot of implementation details that maybe we'll gloss over or get into later that sort of contribute to it. But those two main things are, one is they went to a mixture of experts model, which we'll define in a second.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And then the other thing is that they invented this new technique called MLA latent attention. Both of these are big deals. Mixture of experts is something that's been in the literature for a handful of years. And OpenAI with GPT-4 was the first one to productize a mixture of experts model.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And then the other thing is that they invented this new technique called MLA latent attention. Both of these are big deals. Mixture of experts is something that's been in the literature for a handful of years. And OpenAI with GPT-4 was the first one to productize a mixture of experts model.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And then the other thing is that they invented this new technique called MLA latent attention. Both of these are big deals. Mixture of experts is something that's been in the literature for a handful of years. And OpenAI with GPT-4 was the first one to productize a mixture of experts model.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And what this means is when you look at the common models around that most people have been able to interact with that are open, right? Think LAMA. LAMA is a dense model. i.e. every single parameter or neuron is activated as you're going through the model for every single token you generate, right? Now, with a mixture of experts model, you don't do that, right?

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And what this means is when you look at the common models around that most people have been able to interact with that are open, right? Think LAMA. LAMA is a dense model. i.e. every single parameter or neuron is activated as you're going through the model for every single token you generate, right? Now, with a mixture of experts model, you don't do that, right?

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And what this means is when you look at the common models around that most people have been able to interact with that are open, right? Think LAMA. LAMA is a dense model. i.e. every single parameter or neuron is activated as you're going through the model for every single token you generate, right? Now, with a mixture of experts model, you don't do that, right?

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

How does the human actually work, right? It's like, oh, well, my visual cortex is active when I'm thinking about, you know, vision tasks and like, you know, other things, right? My amygdala is when I'm scared, right? These different aspects of your brain are focused on different things. A mixture of experts model attempts to approximate this to some extent.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

How does the human actually work, right? It's like, oh, well, my visual cortex is active when I'm thinking about, you know, vision tasks and like, you know, other things, right? My amygdala is when I'm scared, right? These different aspects of your brain are focused on different things. A mixture of experts model attempts to approximate this to some extent.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

How does the human actually work, right? It's like, oh, well, my visual cortex is active when I'm thinking about, you know, vision tasks and like, you know, other things, right? My amygdala is when I'm scared, right? These different aspects of your brain are focused on different things. A mixture of experts model attempts to approximate this to some extent.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

It's nowhere close to what a brain architecture is, but different portions of the model activate, right? You'll have a set number of experts in the model and a set number that are activated each time. And this dramatically reduces both your training and inference costs.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

It's nowhere close to what a brain architecture is, but different portions of the model activate, right? You'll have a set number of experts in the model and a set number that are activated each time. And this dramatically reduces both your training and inference costs.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

It's nowhere close to what a brain architecture is, but different portions of the model activate, right? You'll have a set number of experts in the model and a set number that are activated each time. And this dramatically reduces both your training and inference costs.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Because now if you think about the parameter count as the sort of total embedding space for all of this knowledge that you're compressing down during training, When you're embedding this data in, instead of having to activate every single parameter every single time you're training or running inference, now you can just activate a subset.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Because now if you think about the parameter count as the sort of total embedding space for all of this knowledge that you're compressing down during training, When you're embedding this data in, instead of having to activate every single parameter every single time you're training or running inference, now you can just activate a subset.

โ† Previous Page 1 of 84 Next โ†’