Kevin Weil
👤 SpeakerAppearances Over Time
Podcast Appearances
You can talk about creative writing evals though.
And with creative writing, there's no answer.
So how do you grade that, right?
That's one problem.
The other is like, as you start to take on more complex tasks, you're not just answering questions.
you're actually trying to automate some multi-step workflow, there may be ambiguity in the right way to do that.
If I'm an AI booking a flight for you,
There's not a single way to grade which correct flight, you know.
You also get into these really interesting, challenging, subjective ways of how do we actually grade this particular task?
And part of having an eval, if you want to at least automate it, is you need to also have a grader for it so that you can very quickly understand how you're doing on that eval.
So it is interesting.
It's one of the skills that I think is going to be more and more important for PMs over time is the ability to actually create evals for the products that you're building.
I mean, actually, more than people realize, I think.
I would love to make it over time less of a thing.
And I think over time it is.
If you go back a year or two, everybody was talking about prompt engineering and it was going to be the skill that everybody had to master in order to do anything with AI.
You don't hear it talked about quite as much like that.
And I think that's a good thing.
Ideally, it matters less and less that for any particular user, if they have a question, they want an AI to do something for them.
You shouldn't need to get into like arcana around, did I use the exact right word?