Grant Harvey
๐ค SpeakerVoice Profile Active
This person's voice can be automatically recognized across podcast episodes using AI voice matching.
Appearances Over Time
Podcast Appearances
Russia and then you'll you'll check that poem and you'll have something about what if that poem is good or bad and you'll give it so great and it'll learn from that and so that's reinforcement learning or you can call it reward modeling you're basically allowing you to change the way you reward your model for different types of answers and then evaluation is building the like the test the model has to take to understand if it's good because companies will release loads of different versions of models and they've got to understand if it's better or worse like I mean you've seen the news about track dbt5 dbt5 they released it
It was obviously better in some metrics, but the audience wasn't happy.
And that's why human evaluation is so necessary because people are not deterministic too.
People like to have opinions on things and you can't just be like, well, this was better than all our benchmarks.
If it feels different to someone and the user doesn't like it, it doesn't matter if it's better.
And that's evaluation.
So we're kind of, we're the teachers behind the models.
Like if everybody's running off the internet, like we're in trouble.
Ultimately, a model is going to look at its huge data set and then you're going to have a large, an impossibly large set of hyperparameters that you're going to configure to try and understand what the best next token to predict is based upon the data it's looked at.
But ultimately, you've either got to improve the underlying data set, which is hard.
Like the data sets are huge.
They'd set to petabytes of information.
Or you've got to do what's called post-training where you use sort of,
smaller sets of data to improve the weightings.
Like for clarity, by the way, I am not an ML scientist.
So I'm going to say some things and they'll be listening as being like, this guy's a charlatan.
So just for clarity, before I get flamed, I'm giving some vibes here rather than the exact science.
And then I think the evaluation speed is so key.
Like there's all these different models and in lots of ways, like you could go, if you go to
chat GPT, Claude, Gemini, and you ask it a reasonably simple question, it's going to give you the right answer pretty much.