Dylan Patel
๐ค SpeakerAppearances Over Time
Podcast Appearances
It has a different flavor to it. Its behavior is less expressive than something like O1. It has fewer tracks than it is on. Quinn released a model last fall, QWQ, which was their preview reasoning model. And in DeepSeek had R1 Lite last fall, where these models kind of felt like they're on rails, where they really, really only can do math and code. And O1 is, it can answer anything.
It has a different flavor to it. Its behavior is less expressive than something like O1. It has fewer tracks than it is on. Quinn released a model last fall, QWQ, which was their preview reasoning model. And in DeepSeek had R1 Lite last fall, where these models kind of felt like they're on rails, where they really, really only can do math and code. And O1 is, it can answer anything.
It has a different flavor to it. Its behavior is less expressive than something like O1. It has fewer tracks than it is on. Quinn released a model last fall, QWQ, which was their preview reasoning model. And in DeepSeek had R1 Lite last fall, where these models kind of felt like they're on rails, where they really, really only can do math and code. And O1 is, it can answer anything.
It might not be perfect for some tasks, but it's flexible. It has some richness to it. And this is kind of the art of Is a model a little bit undercooked? It's good to get a model out the door, but it's hard to gauge and it takes a lot of taste to be like, is this a full-fledged model? Can I use this for everything? They're probably more similar for math and code.
It might not be perfect for some tasks, but it's flexible. It has some richness to it. And this is kind of the art of Is a model a little bit undercooked? It's good to get a model out the door, but it's hard to gauge and it takes a lot of taste to be like, is this a full-fledged model? Can I use this for everything? They're probably more similar for math and code.
It might not be perfect for some tasks, but it's flexible. It has some richness to it. And this is kind of the art of Is a model a little bit undercooked? It's good to get a model out the door, but it's hard to gauge and it takes a lot of taste to be like, is this a full-fledged model? Can I use this for everything? They're probably more similar for math and code.
My quick read is that Gemini Flash is not... trained the same way as 01, but taking an existing training stack, adding reasoning to it. So taking a more normal training stack and adding reasoning to it. And I'm sure they're going to have more. I mean, they've done quick releases on Gemini Flash, the reasoning, and this is the second version from the holidays. It's evolving fast and
My quick read is that Gemini Flash is not... trained the same way as 01, but taking an existing training stack, adding reasoning to it. So taking a more normal training stack and adding reasoning to it. And I'm sure they're going to have more. I mean, they've done quick releases on Gemini Flash, the reasoning, and this is the second version from the holidays. It's evolving fast and
My quick read is that Gemini Flash is not... trained the same way as 01, but taking an existing training stack, adding reasoning to it. So taking a more normal training stack and adding reasoning to it. And I'm sure they're going to have more. I mean, they've done quick releases on Gemini Flash, the reasoning, and this is the second version from the holidays. It's evolving fast and
it takes longer to make this training stack where you're doing this large scale.
it takes longer to make this training stack where you're doing this large scale.
it takes longer to make this training stack where you're doing this large scale.
Yeah.
Yeah.
Yeah.
The way I can ramble, why I can ramble about this so much is that we've been working on this at AI2 before O1 was fully available to everyone and before R1, which is essentially using this RL training for fine tuning.
The way I can ramble, why I can ramble about this so much is that we've been working on this at AI2 before O1 was fully available to everyone and before R1, which is essentially using this RL training for fine tuning.
The way I can ramble, why I can ramble about this so much is that we've been working on this at AI2 before O1 was fully available to everyone and before R1, which is essentially using this RL training for fine tuning.
We use this in our like Tulu series of models and you can elicit the same behaviors where you say like weight and so much on, but it's so late in the training process that this kind of reasoning expression is much lighter. Yeah. So there's essentially a gradation, and just how much of this RL training you put into it determines how the output looks.
We use this in our like Tulu series of models and you can elicit the same behaviors where you say like weight and so much on, but it's so late in the training process that this kind of reasoning expression is much lighter. Yeah. So there's essentially a gradation, and just how much of this RL training you put into it determines how the output looks.