Corey Knowles
๐ค SpeakerVoice Profile Active
This person's voice can be automatically recognized across podcast episodes using AI voice matching.
Appearances Over Time
Podcast Appearances
We're still very focused on the coding use case.
But we're right about at the time where it's interesting to try more.
Yeah, I mean, basically these are tasks that relate to like writing code.
And so what we do is we have, it's kind of like an exam, if you will.
Let me know if this is like too basic of a way to explain it, but basically.
Yeah, yeah.
So like we have these things that we call evals, like evaluations.
And the idea is that,
It's kind of like an exam.
And when we train a model, we give it this exam, which is like, hey, do you know how to do this very basic task?
Can you navigate to a folder in a terminal?
Okay, what about a harder task?
Can you fix this bug?
Okay, what about an even harder task?
Can you set up my computer in the right way?
Or can you even perform this task?
And so a lot of these evals are inspired by real life because we're trying to measure real life value.
And so freelancers thinking about freelance tasks that have been available online for humans to go attempt, and how many of these can a model achieve?
Terminal bench is the idea of if you're, do I have my terminal up?
Yeah.