Dylan Patel
๐ค SpeakerAppearances Over Time
Podcast Appearances
And the final stage is much newer and will link to what is done in R1. And these reasoning models is, I think, OpenAI's name for this. They had this new API in the fall, which they called the Reinforcement Fine Tuning API. This is the idea that you use the techniques of reinforcement learning, which is a whole framework of AI. There's a deep literature here.
And the final stage is much newer and will link to what is done in R1. And these reasoning models is, I think, OpenAI's name for this. They had this new API in the fall, which they called the Reinforcement Fine Tuning API. This is the idea that you use the techniques of reinforcement learning, which is a whole framework of AI. There's a deep literature here.
To summarize, it's often known as trial and error learning or the subfield of AI where you're trying to make sequential decisions in a certain potentially noisy environment. There's a lot of ways we could go down that. but fine tuning language models where they can generate an answer and then you check to see if the answer matches the true solution.
To summarize, it's often known as trial and error learning or the subfield of AI where you're trying to make sequential decisions in a certain potentially noisy environment. There's a lot of ways we could go down that. but fine tuning language models where they can generate an answer and then you check to see if the answer matches the true solution.
To summarize, it's often known as trial and error learning or the subfield of AI where you're trying to make sequential decisions in a certain potentially noisy environment. There's a lot of ways we could go down that. but fine tuning language models where they can generate an answer and then you check to see if the answer matches the true solution.
For math or code, you have an exactly correct answer for math. You can have unit tests for code. And what we're doing is we are checking the language models work and we're giving it multiple opportunities on the same questions to see if it is right. And if you keep doing this, the models can learn to improve in verifiable domains. to a great extent. It works really well.
For math or code, you have an exactly correct answer for math. You can have unit tests for code. And what we're doing is we are checking the language models work and we're giving it multiple opportunities on the same questions to see if it is right. And if you keep doing this, the models can learn to improve in verifiable domains. to a great extent. It works really well.
For math or code, you have an exactly correct answer for math. You can have unit tests for code. And what we're doing is we are checking the language models work and we're giving it multiple opportunities on the same questions to see if it is right. And if you keep doing this, the models can learn to improve in verifiable domains. to a great extent. It works really well.
It's a newer technique in the academic literature. It's been used at frontier labs in the US that don't share every detail for multiple years. So this is the idea of using reinforcement learning with language models, and it has been taking off, especially in this deep-seek moment.
It's a newer technique in the academic literature. It's been used at frontier labs in the US that don't share every detail for multiple years. So this is the idea of using reinforcement learning with language models, and it has been taking off, especially in this deep-seek moment.
It's a newer technique in the academic literature. It's been used at frontier labs in the US that don't share every detail for multiple years. So this is the idea of using reinforcement learning with language models, and it has been taking off, especially in this deep-seek moment.
So let's start with DeepSeq v3 again. It's what more people would have tried something like it. You ask it a question. It'll start generating tokens very fast. And those tokens will look like a very human legible answer. It'll be some sort of markdown list. It might have formatting to help you draw to the core details in the answer. And it'll generate tens to hundreds of tokens.
So let's start with DeepSeq v3 again. It's what more people would have tried something like it. You ask it a question. It'll start generating tokens very fast. And those tokens will look like a very human legible answer. It'll be some sort of markdown list. It might have formatting to help you draw to the core details in the answer. And it'll generate tens to hundreds of tokens.
So let's start with DeepSeq v3 again. It's what more people would have tried something like it. You ask it a question. It'll start generating tokens very fast. And those tokens will look like a very human legible answer. It'll be some sort of markdown list. It might have formatting to help you draw to the core details in the answer. And it'll generate tens to hundreds of tokens.
A token is normally a word for common words or a subword part in a longer word. And it'll look like a very high quality Reddit or Stack Overflow answer. These models are really getting good at doing these across a wide variety of domains. Even things that if you're an expert, things that are close to the fringe of knowledge, they will still be fairly good at.
A token is normally a word for common words or a subword part in a longer word. And it'll look like a very high quality Reddit or Stack Overflow answer. These models are really getting good at doing these across a wide variety of domains. Even things that if you're an expert, things that are close to the fringe of knowledge, they will still be fairly good at.
A token is normally a word for common words or a subword part in a longer word. And it'll look like a very high quality Reddit or Stack Overflow answer. These models are really getting good at doing these across a wide variety of domains. Even things that if you're an expert, things that are close to the fringe of knowledge, they will still be fairly good at.
Cutting edge AI topics that I do research on, these models are capable for study aid and they're regularly updated. Where this changes is with the DeepSeq R1, what is called these reasoning models, is when you see tokens coming from these models to start, it will be a large chain of thought process.
Cutting edge AI topics that I do research on, these models are capable for study aid and they're regularly updated. Where this changes is with the DeepSeq R1, what is called these reasoning models, is when you see tokens coming from these models to start, it will be a large chain of thought process.
Cutting edge AI topics that I do research on, these models are capable for study aid and they're regularly updated. Where this changes is with the DeepSeq R1, what is called these reasoning models, is when you see tokens coming from these models to start, it will be a large chain of thought process.