Nathan Lambert
๐ค SpeakerAppearances Over Time
Podcast Appearances
To give context, right? Everyone, one of the parts of like freaking this out was like trying to reach the capabilities. The other aspect is they did it so cheap, right? And the so cheap, we kind of talked about on the training side, why it was so cheap.
To give context, right? Everyone, one of the parts of like freaking this out was like trying to reach the capabilities. The other aspect is they did it so cheap, right? And the so cheap, we kind of talked about on the training side, why it was so cheap.
To give context, right? Everyone, one of the parts of like freaking this out was like trying to reach the capabilities. The other aspect is they did it so cheap, right? And the so cheap, we kind of talked about on the training side, why it was so cheap.
So I think there's a couple factors here, right? One is that they do have model architecture innovations, right? This MLA, this new attention that they've done is different than the attention from attention is all you need to transform our attention, right? Now, others have already innovated.
So I think there's a couple factors here, right? One is that they do have model architecture innovations, right? This MLA, this new attention that they've done is different than the attention from attention is all you need to transform our attention, right? Now, others have already innovated.
So I think there's a couple factors here, right? One is that they do have model architecture innovations, right? This MLA, this new attention that they've done is different than the attention from attention is all you need to transform our attention, right? Now, others have already innovated.
There's a lot of work like MQA, GQA, local, global, all these different innovations that like try to bend the curve, right? It's still quadratic, but the constant is now smaller, right?
There's a lot of work like MQA, GQA, local, global, all these different innovations that like try to bend the curve, right? It's still quadratic, but the constant is now smaller, right?
There's a lot of work like MQA, GQA, local, global, all these different innovations that like try to bend the curve, right? It's still quadratic, but the constant is now smaller, right?
It's 80% to 90% versus the original, but then versus what people are actually doing. It's still an innovation.
It's 80% to 90% versus the original, but then versus what people are actually doing. It's still an innovation.
It's 80% to 90% versus the original, but then versus what people are actually doing. It's still an innovation.
Well, and not just that, right? Like other people have implemented techniques like local-global and sliding window and GQMQA. But anyways, like DeepSeq has their attention mechanism as a true architectural innovation. They did tons of experimentation. And this dramatically reduces the memory pressure. It's still there, right? It's still attention. It's still quadratic.
Well, and not just that, right? Like other people have implemented techniques like local-global and sliding window and GQMQA. But anyways, like DeepSeq has their attention mechanism as a true architectural innovation. They did tons of experimentation. And this dramatically reduces the memory pressure. It's still there, right? It's still attention. It's still quadratic.
Well, and not just that, right? Like other people have implemented techniques like local-global and sliding window and GQMQA. But anyways, like DeepSeq has their attention mechanism as a true architectural innovation. They did tons of experimentation. And this dramatically reduces the memory pressure. It's still there, right? It's still attention. It's still quadratic.
It's just dramatically reduced it relative to prior forms.
It's just dramatically reduced it relative to prior forms.
It's just dramatically reduced it relative to prior forms.
So I think this is very important, right? OpenAI is, you know, that drastic gap between DeepSeek and pricing. But DeepSeek is offering the same model because they open-weighted it to everyone else for a very similar, like much lower price than what others are able to serve it for.
So I think this is very important, right? OpenAI is, you know, that drastic gap between DeepSeek and pricing. But DeepSeek is offering the same model because they open-weighted it to everyone else for a very similar, like much lower price than what others are able to serve it for.