Will Bryk
π€ SpeakerAppearances Over Time
Podcast Appearances
So that's one problem.
There aren't that many standard third-party retrieval evals, and they've been, like, benchmarked.
And they're not really actually good representations of agentic search, like what agents actually need.
And so it is a problem in the industry where, like, customers can't really know what is true, which is sad.
It also demonstrates the need for a really good search engine to distinguish what is true.
Yeah.
It's just another example.
But yeah, I mean, the ground truth for customers is their own A-B tests.
And so when we do A, like literally they are testing us versus other providers on their use case.
And if they have enough data to do A-B tests, that's the best.
Often they make their own evals.
So sophisticated customers will make their own evals.
Super sophisticated customers will run A-B tests.
And then like customers who just want something might just like look at evals that are published online.
But certainly for the sophisticated customers when they test us,
it becomes a lot clearer who's like on the top.
But yeah, we want to improve this ecosystem.
We want to be like the research lab that like helps improve the ecosystem and like publishes things.
And even if we're not at the top, we want to show.
And so you'll see more coming out there.