Dylan Patel
๐ค SpeakerAppearances Over Time
Podcast Appearances
Yeah, so these weights that you can download from Hugging Face or other platforms are very big matrices of numbers. You can download them to a computer in your own house that has no internet and you can run this model and you're totally in control of your data.
Yeah, so these weights that you can download from Hugging Face or other platforms are very big matrices of numbers. You can download them to a computer in your own house that has no internet and you can run this model and you're totally in control of your data.
That is something that is different than how a lot of language model usage is actually done today, which is mostly through APIs, where you send your prompt to GPUs run by certain companies. And these companies will have different distributions and policies on how your data is stored, if it is used to train future models, where it is stored, if it is encrypted, and so on.
That is something that is different than how a lot of language model usage is actually done today, which is mostly through APIs, where you send your prompt to GPUs run by certain companies. And these companies will have different distributions and policies on how your data is stored, if it is used to train future models, where it is stored, if it is encrypted, and so on.
That is something that is different than how a lot of language model usage is actually done today, which is mostly through APIs, where you send your prompt to GPUs run by certain companies. And these companies will have different distributions and policies on how your data is stored, if it is used to train future models, where it is stored, if it is encrypted, and so on.
So the open weights are, you have your fate of data in your own hands. And that is something that is deeply connected to the soul of open source computing.
So the open weights are, you have your fate of data in your own hands. And that is something that is deeply connected to the soul of open source computing.
So the open weights are, you have your fate of data in your own hands. And that is something that is deeply connected to the soul of open source computing.
Yes. So for one, I have very understanding of many people being confused by these two model names. So I would say the best way to think about this is that when training a language model, you have what is called pre-training, which is when you're predicting the large amounts of mostly internet text. You're trying to predict the next token.
Yes. So for one, I have very understanding of many people being confused by these two model names. So I would say the best way to think about this is that when training a language model, you have what is called pre-training, which is when you're predicting the large amounts of mostly internet text. You're trying to predict the next token.
Yes. So for one, I have very understanding of many people being confused by these two model names. So I would say the best way to think about this is that when training a language model, you have what is called pre-training, which is when you're predicting the large amounts of mostly internet text. You're trying to predict the next token.
And what to know about these new DeepSeq models is that they do this internet large-scale pre-training once to get what is called DeepSeq v3 base. This is a base model. It's just going to finish your sentences for you. It's going to be harder to work with than ChatGPT.
And what to know about these new DeepSeq models is that they do this internet large-scale pre-training once to get what is called DeepSeq v3 base. This is a base model. It's just going to finish your sentences for you. It's going to be harder to work with than ChatGPT.
And what to know about these new DeepSeq models is that they do this internet large-scale pre-training once to get what is called DeepSeq v3 base. This is a base model. It's just going to finish your sentences for you. It's going to be harder to work with than ChatGPT.
And then what DeepSeek did is they've done two different post-training regimes to make the models have specific desirable behaviors. So what is the more normal model in terms of the last few years of AI, an instruct model, a chat model, a quote-unquote aligned model, a helpful model? There are many ways to describe this. is more standard post-training.
And then what DeepSeek did is they've done two different post-training regimes to make the models have specific desirable behaviors. So what is the more normal model in terms of the last few years of AI, an instruct model, a chat model, a quote-unquote aligned model, a helpful model? There are many ways to describe this. is more standard post-training.
And then what DeepSeek did is they've done two different post-training regimes to make the models have specific desirable behaviors. So what is the more normal model in terms of the last few years of AI, an instruct model, a chat model, a quote-unquote aligned model, a helpful model? There are many ways to describe this. is more standard post-training.
So this is things like instruction tuning, reinforced learning from human feedback. We'll get into some of these words. And this is what they did to create the DeepSeq v3 model. This was the first model to be released, and it is very high-performance. It's competitive with GPT-4, LAMA-405b, so on.
So this is things like instruction tuning, reinforced learning from human feedback. We'll get into some of these words. And this is what they did to create the DeepSeq v3 model. This was the first model to be released, and it is very high-performance. It's competitive with GPT-4, LAMA-405b, so on.
So this is things like instruction tuning, reinforced learning from human feedback. We'll get into some of these words. And this is what they did to create the DeepSeq v3 model. This was the first model to be released, and it is very high-performance. It's competitive with GPT-4, LAMA-405b, so on.