Dylan Patel
๐ค SpeakerAppearances Over Time
Podcast Appearances
Like, I'm very bad at those things, and thankfully I've, like, been able to surround myself in my life, whether it's through birth or not, with people who help me with the things I'm bad at, because I'm very bad at a lot of things.
When I don't, like, call people or, like,
be considerate of what they're thinking because I'm just vibing and I'm doing whatever, you know, I'm like focused in on like this path.
That path ends up hurting someone else, right?
Whether it's like, hey, I didn't call someone or I didn't like think about their feelings when I did an action or when I said something, but that makes me an asshole.
And yes, I should be more conscious of this.
And I try to be, but it's like, it's just one of the things I'm going to wrestle with in my life forever.
And a lot of times I don't even realize I'm being a freaking idiot until my brother's like, you're a freaking idiot.
That's the kindest thing anyone's ever done for me is like my brother through my whole life.
Thank you so much.
Yeah.
Yeah, so DeepSeq v3 is a new mixture of experts, transformer language model from DeepSeq, who is based in China. They have some new specifics in the model that we'll get into. Largely, this is a open weight model, and it's a instruction model like what you would use in ChatGPT. They also released what is called the base model, which is before these techniques of post-training.
Yeah, so DeepSeq v3 is a new mixture of experts, transformer language model from DeepSeq, who is based in China. They have some new specifics in the model that we'll get into. Largely, this is a open weight model, and it's a instruction model like what you would use in ChatGPT. They also released what is called the base model, which is before these techniques of post-training.
Yeah, so DeepSeq v3 is a new mixture of experts, transformer language model from DeepSeq, who is based in China. They have some new specifics in the model that we'll get into. Largely, this is a open weight model, and it's a instruction model like what you would use in ChatGPT. They also released what is called the base model, which is before these techniques of post-training.
Most people use instruction models today, and those are what served in all sorts of applications. This was released on, I believe, December 26th, or that week. And then weeks later, on January 20th, DeepSeq released DeepSeq R1, which is a reasoning model, which... really accelerated a lot of this discussion.
Most people use instruction models today, and those are what served in all sorts of applications. This was released on, I believe, December 26th, or that week. And then weeks later, on January 20th, DeepSeq released DeepSeq R1, which is a reasoning model, which... really accelerated a lot of this discussion.
Most people use instruction models today, and those are what served in all sorts of applications. This was released on, I believe, December 26th, or that week. And then weeks later, on January 20th, DeepSeq released DeepSeq R1, which is a reasoning model, which... really accelerated a lot of this discussion.
This reasoning model has a lot of overlapping training steps to DeepSeq v3, and it's confusing that you have a base model called v3 that you do something to to get a chat model, and then you do some different things to get a reasoning model. I think a lot of the AI industry is going through this challenge of communications right now where OpenAI makes fun of their own naming schemes.
This reasoning model has a lot of overlapping training steps to DeepSeq v3, and it's confusing that you have a base model called v3 that you do something to to get a chat model, and then you do some different things to get a reasoning model. I think a lot of the AI industry is going through this challenge of communications right now where OpenAI makes fun of their own naming schemes.
This reasoning model has a lot of overlapping training steps to DeepSeq v3, and it's confusing that you have a base model called v3 that you do something to to get a chat model, and then you do some different things to get a reasoning model. I think a lot of the AI industry is going through this challenge of communications right now where OpenAI makes fun of their own naming schemes.