Zach Lipton
๐ค SpeakerAppearances Over Time
Podcast Appearances
What is it looking at?
And what is it associating with?
And then you also need to engage humans in a kind of ongoing, never-ending process of continually calibrating tasks
the automated metrics that you have to assure that their judgments continue to align with human experts.
So you have a whole apparatus of metrics that are looking for all these very important things.
Some of them are looking at appropriate use of like very
in the weeds aspects of um style of medical language some are targeted to catching omissions some care about misattributions right like is the when the patient says something it should show up in the history of present illness in the subjective section of the conversation when the doctor says something it should show up in the assessment or in the
objective section it's like dawning on me now that that might sound offensive to some patients but you know the the point is there's a whole panel of sort of like attributes of interest you care about accuracy of spelling you care about style you care about organization you care about desiderata that relate to compliance with downstream billing processes and for each of these
you need to construct a judge and you also need to make sure that you're checking in with the ultimate gold standard source of truth, which is both the judgments of human experts and also downstream outcomes.
And this is a massive pursuit that we pour incredible amounts of energy into.
And it is not something where you ever say, no one, neither at a bridge, nor an anthropic, nor an opening, I would ever tell you.
And now, again, we have solved evaluation.
It is a task that requires persistence and like humility.
Yeah.
I think it's all challenging.
I'll get a little bit meta.
I think when you're dealing with a source of truth that is fundamentally, as you find it in the wild, so organic and unstructured.
I think what makes that question hard to answer is when you say...
is there one thing that is hard to deal with?
And like the hardest thing is the meta thing, which is the problem doesn't arrive to you with a enumeration of these are the things.