Dylan Patel
๐ค SpeakerAppearances Over Time
Podcast Appearances
Kind of like, yes, adding the human... Designing the perfect Google button. Google's famous for having people design buttons that are so perfect. And it's like, how is AI going to do that? Like, they could give you all the ideas, but...
And humans are actually very good at reading or judging between two things. This goes back to the core of what RLHF and preference tuning is, is that it's hard to generate a good answer for a lot of problems, but it's easy to see which one is better. And that's how we're using humans for AI now, is judging which one is better. And that's what software engineering could look like.
And humans are actually very good at reading or judging between two things. This goes back to the core of what RLHF and preference tuning is, is that it's hard to generate a good answer for a lot of problems, but it's easy to see which one is better. And that's how we're using humans for AI now, is judging which one is better. And that's what software engineering could look like.
And humans are actually very good at reading or judging between two things. This goes back to the core of what RLHF and preference tuning is, is that it's hard to generate a good answer for a lot of problems, but it's easy to see which one is better. And that's how we're using humans for AI now, is judging which one is better. And that's what software engineering could look like.
The PR review, here's a few options. Here are some potential pros and cons. And they're going to be judges.
The PR review, here's a few options. Here are some potential pros and cons. And they're going to be judges.
The PR review, here's a few options. Here are some potential pros and cons. And they're going to be judges.
I'll explain what a Tulu is. A Tulu is a hybrid camel when you breed a dromedary with a Bacchian camel. Back in the early days after Chachipiti, there was a big wave of models coming out like Alpaca, Vicuna, et cetera, that were all named after various mammalian species. So Tulu, the brand is multiple years old, which comes from that. And
I'll explain what a Tulu is. A Tulu is a hybrid camel when you breed a dromedary with a Bacchian camel. Back in the early days after Chachipiti, there was a big wave of models coming out like Alpaca, Vicuna, et cetera, that were all named after various mammalian species. So Tulu, the brand is multiple years old, which comes from that. And
I'll explain what a Tulu is. A Tulu is a hybrid camel when you breed a dromedary with a Bacchian camel. Back in the early days after Chachipiti, there was a big wave of models coming out like Alpaca, Vicuna, et cetera, that were all named after various mammalian species. So Tulu, the brand is multiple years old, which comes from that. And
We've been playing at the frontiers of post-training with open source code. And this first part of this release was in the fall where we built on Lama's open models, open weight models, and then we add in our fully open code, our fully open data. There's a popular benchmark that is Chatbot Arena, and that's generally the metric by which how these chat models are evaluated.
We've been playing at the frontiers of post-training with open source code. And this first part of this release was in the fall where we built on Lama's open models, open weight models, and then we add in our fully open code, our fully open data. There's a popular benchmark that is Chatbot Arena, and that's generally the metric by which how these chat models are evaluated.
We've been playing at the frontiers of post-training with open source code. And this first part of this release was in the fall where we built on Lama's open models, open weight models, and then we add in our fully open code, our fully open data. There's a popular benchmark that is Chatbot Arena, and that's generally the metric by which how these chat models are evaluated.
And it's humans compare random models from different organizations. And if you looked at the leaderboard in November or December, among the top 60 models from 10s to 20s of organizations, none of them had open code or data for just post-training. Among that, even fewer or none have pre-training data and code available, but post-training is much more accessible at this time.
And it's humans compare random models from different organizations. And if you looked at the leaderboard in November or December, among the top 60 models from 10s to 20s of organizations, none of them had open code or data for just post-training. Among that, even fewer or none have pre-training data and code available, but post-training is much more accessible at this time.
And it's humans compare random models from different organizations. And if you looked at the leaderboard in November or December, among the top 60 models from 10s to 20s of organizations, none of them had open code or data for just post-training. Among that, even fewer or none have pre-training data and code available, but post-training is much more accessible at this time.
It's still pretty cheap and you can do it. And the thing is, how high can we push this number where people have access to all the code and data? So that's kind of the motivation of the project. We draw on lessons from Lama. NVIDIA had a Nemotron model where the recipe for their post-training was fairly open with some data and a paper.
It's still pretty cheap and you can do it. And the thing is, how high can we push this number where people have access to all the code and data? So that's kind of the motivation of the project. We draw on lessons from Lama. NVIDIA had a Nemotron model where the recipe for their post-training was fairly open with some data and a paper.
It's still pretty cheap and you can do it. And the thing is, how high can we push this number where people have access to all the code and data? So that's kind of the motivation of the project. We draw on lessons from Lama. NVIDIA had a Nemotron model where the recipe for their post-training was fairly open with some data and a paper.
And it's putting all these together to try to create a recipe that people can fine-tune models like GPT-4 to their domain.