Dylan Patel
๐ค SpeakerAppearances Over Time
Podcast Appearances
So it's not necessarily saying that more people have the DeepSeek app than the ChatGPT app. But it is... Still remarkable, Claude has never hit the number one in the app store, even though everyone in San Francisco is like, oh my God, you got to use Claude, don't use strategy BT. So DeepSeek hit this.
So it's not necessarily saying that more people have the DeepSeek app than the ChatGPT app. But it is... Still remarkable, Claude has never hit the number one in the app store, even though everyone in San Francisco is like, oh my God, you got to use Claude, don't use strategy BT. So DeepSeek hit this.
So it's not necessarily saying that more people have the DeepSeek app than the ChatGPT app. But it is... Still remarkable, Claude has never hit the number one in the app store, even though everyone in San Francisco is like, oh my God, you got to use Claude, don't use strategy BT. So DeepSeek hit this.
They also launched an API product recently where you can ping their API and get these super long responses for R1 out. In... At the same time as these are out, we'll get to what's happened to them.
They also launched an API product recently where you can ping their API and get these super long responses for R1 out. In... At the same time as these are out, we'll get to what's happened to them.
They also launched an API product recently where you can ping their API and get these super long responses for R1 out. In... At the same time as these are out, we'll get to what's happened to them.
Because the model weights for DeepSeq R1 are openly available and the license is very friendly, the MIT license is commercially available, all of these mid-sized companies and big companies are trying to be first to serve R1 to their users. We were trying to evaluate R1 because we have really similar research going on. We released the model and we're trying to compare to it.
Because the model weights for DeepSeq R1 are openly available and the license is very friendly, the MIT license is commercially available, all of these mid-sized companies and big companies are trying to be first to serve R1 to their users. We were trying to evaluate R1 because we have really similar research going on. We released the model and we're trying to compare to it.
Because the model weights for DeepSeq R1 are openly available and the license is very friendly, the MIT license is commercially available, all of these mid-sized companies and big companies are trying to be first to serve R1 to their users. We were trying to evaluate R1 because we have really similar research going on. We released the model and we're trying to compare to it.
And out of all the companies that are... quote unquote serving R1 and they're doing it at prices that are way higher than the deep seek API. Most of them barely work and the throughput is really low.
And out of all the companies that are... quote unquote serving R1 and they're doing it at prices that are way higher than the deep seek API. Most of them barely work and the throughput is really low.
And out of all the companies that are... quote unquote serving R1 and they're doing it at prices that are way higher than the deep seek API. Most of them barely work and the throughput is really low.
Related to our previous discussion, this multi-head latent attention can save about 80% to 90% in memory from the attention mechanism, which helps especially along context.
Related to our previous discussion, this multi-head latent attention can save about 80% to 90% in memory from the attention mechanism, which helps especially along context.
Related to our previous discussion, this multi-head latent attention can save about 80% to 90% in memory from the attention mechanism, which helps especially along context.
This 80% to 90% doesn't say that the whole model is 80% to 90% cheaper, just this one part of it.
This 80% to 90% doesn't say that the whole model is 80% to 90% cheaper, just this one part of it.
This 80% to 90% doesn't say that the whole model is 80% to 90% cheaper, just this one part of it.
We think that OpenAI had a large margin built in. There's multiple factors.
We think that OpenAI had a large margin built in. There's multiple factors.