Dylan Patel
๐ค SpeakerAppearances Over Time
Podcast Appearances
And there's a lot of numbers that are thrown around in AI training, like how many floating point operations or flops are used. And then you can also look at how many hours of these GPUs that are used. And it's largely one loss function taken to a very large amount of compute usage, you just you set up really efficient systems. And then at the end of that, you have the space model.
And pre training is where there is a lot more of complexity in terms of how the process is emerging or evolving and the different types of training losses that you will use. I think this is a lot of techniques grounded in the natural language processing literature. The oldest technique which is still used today is something called instruction tuning or also known as supervised fine tuning.
And pre training is where there is a lot more of complexity in terms of how the process is emerging or evolving and the different types of training losses that you will use. I think this is a lot of techniques grounded in the natural language processing literature. The oldest technique which is still used today is something called instruction tuning or also known as supervised fine tuning.
And pre training is where there is a lot more of complexity in terms of how the process is emerging or evolving and the different types of training losses that you will use. I think this is a lot of techniques grounded in the natural language processing literature. The oldest technique which is still used today is something called instruction tuning or also known as supervised fine tuning.
These acronyms will be IFT or SFT. People really go back and forth throughout them and I will probably do the same which is where you add this
These acronyms will be IFT or SFT. People really go back and forth throughout them and I will probably do the same which is where you add this
These acronyms will be IFT or SFT. People really go back and forth throughout them and I will probably do the same which is where you add this
formatting to the model where it knows to take a question that is like, explain the history of the Roman Empire to me, or a sort of question you'll see on Reddit or Stack Overflow, and then the model will respond in a information-dense but presentable manner. The core of that formatting is in this instruction tuning phase.
formatting to the model where it knows to take a question that is like, explain the history of the Roman Empire to me, or a sort of question you'll see on Reddit or Stack Overflow, and then the model will respond in a information-dense but presentable manner. The core of that formatting is in this instruction tuning phase.
formatting to the model where it knows to take a question that is like, explain the history of the Roman Empire to me, or a sort of question you'll see on Reddit or Stack Overflow, and then the model will respond in a information-dense but presentable manner. The core of that formatting is in this instruction tuning phase.
And then there's two other categories of loss functions that are being used today. One I will classify as preference fine tuning. Preference fine tuning is a generalized term for what came out of reinforcement learning from human feedback, which is RLHF. This reinforcement learning from human feedback is credited as the technique that helped
And then there's two other categories of loss functions that are being used today. One I will classify as preference fine tuning. Preference fine tuning is a generalized term for what came out of reinforcement learning from human feedback, which is RLHF. This reinforcement learning from human feedback is credited as the technique that helped
And then there's two other categories of loss functions that are being used today. One I will classify as preference fine tuning. Preference fine tuning is a generalized term for what came out of reinforcement learning from human feedback, which is RLHF. This reinforcement learning from human feedback is credited as the technique that helped
ChatGPT breakthrough is a technique to make the responses that are nicely formatted, like these Reddit answers, more in tune with what a human would like to read. This is done by collecting pairwise preferences from actual humans out in the world to start. And now AIs are also labeling this data and we'll get into those trade-offs.
ChatGPT breakthrough is a technique to make the responses that are nicely formatted, like these Reddit answers, more in tune with what a human would like to read. This is done by collecting pairwise preferences from actual humans out in the world to start. And now AIs are also labeling this data and we'll get into those trade-offs.
ChatGPT breakthrough is a technique to make the responses that are nicely formatted, like these Reddit answers, more in tune with what a human would like to read. This is done by collecting pairwise preferences from actual humans out in the world to start. And now AIs are also labeling this data and we'll get into those trade-offs.
And you have this kind of contrastive loss function between a good answer and a bad answer. And the model learns to pick up these trends. There's different implementation ways. You have things called reward models. You could have direct alignment algorithms. There's a lot of really specific things you can do, but all of this is about fine tuning to human preferences.
And you have this kind of contrastive loss function between a good answer and a bad answer. And the model learns to pick up these trends. There's different implementation ways. You have things called reward models. You could have direct alignment algorithms. There's a lot of really specific things you can do, but all of this is about fine tuning to human preferences.
And you have this kind of contrastive loss function between a good answer and a bad answer. And the model learns to pick up these trends. There's different implementation ways. You have things called reward models. You could have direct alignment algorithms. There's a lot of really specific things you can do, but all of this is about fine tuning to human preferences.
And the final stage is much newer and will link to what is done in R1. And these reasoning models is, I think, OpenAI's name for this. They had this new API in the fall, which they called the Reinforcement Fine Tuning API. This is the idea that you use the techniques of reinforcement learning, which is a whole framework of AI. There's a deep literature here.