Zach Lipton
๐ค SpeakerAppearances Over Time
Podcast Appearances
We also care of the pertinent details in the audio, which fraction are captured in the transcript.
We also, that would get at some notion of like recall, right?
It's something analogous, like you don't want to have omissions.
And we also have everything in the note
What is substantiated in the transcript?
So we have a bunch of metrics that we create around factuality.
And the question is, well, who's in a position to assess that?
Because it's not as straightforward as like, what's positive here?
You said negative, therefore wrong.
You know, you have to look at a natural language expression together with a natural language expression and say, is this adequately substantiated by the evidence over here?
So there is a dance where basically what we construct is there's a system of sort of the AI.
Well, you know, it's AI all the way down.
You need an AI system that can evaluate the AI system.
This goes by many names in the field.
You hear people talk about automated evaluation.
You hear people talk about LLM as judge.
Mm-hmm.
And the challenge here is that you can't just simply say to an LLM, give it a prompt, say, is it accurate?
Or like, you need to be extremely precise in how you structure the task and how the model is supposed to break it down.
What exactly constitutes a claim?