Dylan Patel
๐ค SpeakerAppearances Over Time
Podcast Appearances
So without fully open source models where you have access to this data, it is... hard to know, or it's harder to replicate. So we'll get into cost numbers for DeepSeq v3 on mostly GPU hours and how much you could pay to rent those yourselves. But without the data, the replication cost is going to be far, far higher. And same goes for the code.
Yeah. DeepSeek is doing fantastic work for disseminating understanding of AI. Their papers are extremely detailed in what they do. And for other companies, teams around the world, they're very actionable in terms of improving your own training techniques. And we'll talk about licenses more. The DeepSeq R1 model has a very permissive license. It's called the MIT license.
Yeah. DeepSeek is doing fantastic work for disseminating understanding of AI. Their papers are extremely detailed in what they do. And for other companies, teams around the world, they're very actionable in terms of improving your own training techniques. And we'll talk about licenses more. The DeepSeq R1 model has a very permissive license. It's called the MIT license.
Yeah. DeepSeek is doing fantastic work for disseminating understanding of AI. Their papers are extremely detailed in what they do. And for other companies, teams around the world, they're very actionable in terms of improving your own training techniques. And we'll talk about licenses more. The DeepSeq R1 model has a very permissive license. It's called the MIT license.
That effectively means there's no downstream restrictions on commercial use. There's no use case restrictions. You can use the outputs from the models to create synthetic data. And this is all fantastic. I think the closest peer is something like Lama, where you have the weights and you have a technical report. And the technical report is very good for Lama.
That effectively means there's no downstream restrictions on commercial use. There's no use case restrictions. You can use the outputs from the models to create synthetic data. And this is all fantastic. I think the closest peer is something like Lama, where you have the weights and you have a technical report. And the technical report is very good for Lama.
That effectively means there's no downstream restrictions on commercial use. There's no use case restrictions. You can use the outputs from the models to create synthetic data. And this is all fantastic. I think the closest peer is something like Lama, where you have the weights and you have a technical report. And the technical report is very good for Lama.
One of the most read PDFs of the year last year is the Lama 3 paper. But in some ways, it's slightly less actionable. It has less details on the training specifics and less plots. And so on. And the Lama 3 license is more restrictive than MIT. And then between the DeepSeek custom license and the Lama license, we could get into this whole rabbit hole.
One of the most read PDFs of the year last year is the Lama 3 paper. But in some ways, it's slightly less actionable. It has less details on the training specifics and less plots. And so on. And the Lama 3 license is more restrictive than MIT. And then between the DeepSeek custom license and the Lama license, we could get into this whole rabbit hole.
One of the most read PDFs of the year last year is the Lama 3 paper. But in some ways, it's slightly less actionable. It has less details on the training specifics and less plots. And so on. And the Lama 3 license is more restrictive than MIT. And then between the DeepSeek custom license and the Lama license, we could get into this whole rabbit hole.
I think we'll make sure we want to go down the license rabbit hole before we do specifics.
I think we'll make sure we want to go down the license rabbit hole before we do specifics.
I think we'll make sure we want to go down the license rabbit hole before we do specifics.
Yeah, especially in the DeepSeq v3, which is their pre-training paper. They were very clear that they are doing interventions on the technical stack that go at many different levels. For example, to get highly efficient training, they're making modifications at or below the CUDA layer for NVIDIA chips.
Yeah, especially in the DeepSeq v3, which is their pre-training paper. They were very clear that they are doing interventions on the technical stack that go at many different levels. For example, to get highly efficient training, they're making modifications at or below the CUDA layer for NVIDIA chips.
Yeah, especially in the DeepSeq v3, which is their pre-training paper. They were very clear that they are doing interventions on the technical stack that go at many different levels. For example, to get highly efficient training, they're making modifications at or below the CUDA layer for NVIDIA chips.
I have never worked there myself, and there are a few people in the world that do that very well, and some of them are at DeepSeq. And these types of people are... at DeepSeek and leading American frontier labs, but there are not many places.
I have never worked there myself, and there are a few people in the world that do that very well, and some of them are at DeepSeq. And these types of people are... at DeepSeek and leading American frontier labs, but there are not many places.
I have never worked there myself, and there are a few people in the world that do that very well, and some of them are at DeepSeq. And these types of people are... at DeepSeek and leading American frontier labs, but there are not many places.
Yeah, so these weights that you can download from Hugging Face or other platforms are very big matrices of numbers. You can download them to a computer in your own house that has no internet and you can run this model and you're totally in control of your data.