Kevin Weil
👤 SpeakerAppearances Over Time
Podcast Appearances
The cool thing is you can fire off 10 of these tasks at once, right?
So we try and actually give you the value of all this parallelism.
where it's not just you can do one thing, but if you have a Codex agent working for you, why not have 10 Codex agents working for you on 10 different tasks?
And by the way, just to connect it to the previous topic on evals, this is also... evals are...
there's a really important kind of subtlety to them too, where they have to be tailored to the product that you're trying to build and the problem that you're trying to solve.
Where, you know, coding isn't one thing.
Just coding is a small vertical of the entire world.
But even within coding, you can be good at lots of different kinds of coding.
And with Codex, that was a great example of going and saying, okay, what kinds of coding really matter to us?
What kinds of tasks and all the tasks that a developer does, what kinds of tasks do we really want to be good at?
And we created evals for those.
And then we made sure to monitor as we train the model, is it getting better and better and better at these?
And, you know, you go and accumulate tasks and examples for the model to learn from, but you do it against a specific set of evals that correspond to a specific set of problems you want to solve.
Yeah, I think part of this is about making sure that, like we talked about earlier, that the user is in control here.
So you should be able to at some point be like, hey, you know what?
You've checked enough.
Like, you're good.
And the other interesting thing in all of this is the technology is evolving so quickly, like much more quickly than I think we're used to with technology.
We're used to things taking like decades to deploy and to really achieve scale.
One of the phenomenons you see with AI technology is there'll be some benchmark, some eval that AI just can't crack.