Corey Knowles
๐ค SpeakerVoice Profile Active
This person's voice can be automatically recognized across podcast episodes using AI voice matching.
Appearances Over Time
Podcast Appearances
So if you're using terminal, a very basic task would be, I don't know, could I, I don't actually know if this is a terminal bench task, but this is my standard use case.
It's like, can I use FFmveg to convert some PNG into some other format, right?
These are the tasks, and what we do is when we have a new model, we give it the exam, we see what score it gets, and then we compare it to the previous model.
And so these are some cool things to notice.
Like, oh, cool, this new model, like Codex Max, is significantly better than Codex.
That's wild.
Yeah.
Yeah, I feel like the reputation that sort of the feedback that we get for the codex models is that they're really smart.
And so if you're wondering, why should I try codex?
I think the simple answer is just you should try it so that you can benefit from the smarter model that can do harder tasks.
I mean, I'm going to show you a bunch of vibe coding demos at the beginning of today just because they're easier and more fun.
But the real work that we do is in more complicated code bases with harder tasks.
And so that's where Codex really shines.
The other cool thing, though, that for me is even more exciting is this graph.
This is the one I wanted to show you.
So we have to explain a little bit with this graph how to read it.
But basically, the takeaway is the model is smarter but also faster.
So, okay, what does this mean?
If you see here this x-axis, this is like how much time the model spent thinking.
So, for example, if I ask Codex, like, to do some task, like, I don't know, let's just do this here.