Dylan Patel
๐ค SpeakerAppearances Over Time
Podcast Appearances
And it's putting all these together to try to create a recipe that people can fine-tune models like GPT-4 to their domain.
And it's putting all these together to try to create a recipe that people can fine-tune models like GPT-4 to their domain.
Tulu has been a series of recipes for post-training. So we've done multiple models over years.
Tulu has been a series of recipes for post-training. So we've done multiple models over years.
Tulu has been a series of recipes for post-training. So we've done multiple models over years.
Yeah, if you start with an open weight-based model, the whole model technically is an open source because you don't know what Lama put into it, which is why we have a separate thing that we'll get to. But it's just getting parts of the pipeline where people can zoom in and customize.
Yeah, if you start with an open weight-based model, the whole model technically is an open source because you don't know what Lama put into it, which is why we have a separate thing that we'll get to. But it's just getting parts of the pipeline where people can zoom in and customize.
Yeah, if you start with an open weight-based model, the whole model technically is an open source because you don't know what Lama put into it, which is why we have a separate thing that we'll get to. But it's just getting parts of the pipeline where people can zoom in and customize.
I know I hear from startups and businesses, they're like, okay, I can take this post-training and try to apply it to my domain. We talk about verifiers a lot. We use this idea, which is reinforcement learning with verifiable domain rewards, RLVR, kind of similar to RLHF. And we applied it to math and the model today, which is like we applied it to the Lama 405B base model from last year.
I know I hear from startups and businesses, they're like, okay, I can take this post-training and try to apply it to my domain. We talk about verifiers a lot. We use this idea, which is reinforcement learning with verifiable domain rewards, RLVR, kind of similar to RLHF. And we applied it to math and the model today, which is like we applied it to the Lama 405B base model from last year.
I know I hear from startups and businesses, they're like, okay, I can take this post-training and try to apply it to my domain. We talk about verifiers a lot. We use this idea, which is reinforcement learning with verifiable domain rewards, RLVR, kind of similar to RLHF. And we applied it to math and the model today, which is like we applied it to the Lama 405B base model from last year.
And we have our other stuff. We have our instruction tuning and our preference tuning. But the math thing is interesting, which is like it's easier to improve this math benchmark. There's a benchmark, M-A-T-H, math, all capitals. tough name on the benchmark is name is the area that you're evaluating. We're researchers. We're not, we're not brands, brand strategists.
And we have our other stuff. We have our instruction tuning and our preference tuning. But the math thing is interesting, which is like it's easier to improve this math benchmark. There's a benchmark, M-A-T-H, math, all capitals. tough name on the benchmark is name is the area that you're evaluating. We're researchers. We're not, we're not brands, brand strategists.
And we have our other stuff. We have our instruction tuning and our preference tuning. But the math thing is interesting, which is like it's easier to improve this math benchmark. There's a benchmark, M-A-T-H, math, all capitals. tough name on the benchmark is name is the area that you're evaluating. We're researchers. We're not, we're not brands, brand strategists.
And this is something that the deep seek paper talked about as well as like at this bigger model, it's easier to elicit powerful capabilities with this RL training. And then they distill it down from that big model to the small model. And this model we released today, we saw the same thing as it were AI too. We don't have a ton of compute. We can't train four or five B models all the time.
And this is something that the deep seek paper talked about as well as like at this bigger model, it's easier to elicit powerful capabilities with this RL training. And then they distill it down from that big model to the small model. And this model we released today, we saw the same thing as it were AI too. We don't have a ton of compute. We can't train four or five B models all the time.
And this is something that the deep seek paper talked about as well as like at this bigger model, it's easier to elicit powerful capabilities with this RL training. And then they distill it down from that big model to the small model. And this model we released today, we saw the same thing as it were AI too. We don't have a ton of compute. We can't train four or five B models all the time.
So we just did a few runs and they tend to work. And it's like, it just shows that there's a lot of room for people to play in these things. And they crushed Lama's actual release, right? Like they're way better than it. Yeah. So our Val numbers, I mean, we have extra months in this, but our Val numbers are like much better than the Lama instruct model that they released.
So we just did a few runs and they tend to work. And it's like, it just shows that there's a lot of room for people to play in these things. And they crushed Lama's actual release, right? Like they're way better than it. Yeah. So our Val numbers, I mean, we have extra months in this, but our Val numbers are like much better than the Lama instruct model that they released.
So we just did a few runs and they tend to work. And it's like, it just shows that there's a lot of room for people to play in these things. And they crushed Lama's actual release, right? Like they're way better than it. Yeah. So our Val numbers, I mean, we have extra months in this, but our Val numbers are like much better than the Lama instruct model that they released.