The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
Generative Benchmarking with Kelly Hong - #728
23 Apr 2025
In this episode, Kelly Hong, a researcher at Chroma, joins us to discuss "Generative Benchmarking," a novel approach to evaluating retrieval systems, like RAG applications, using synthetic data. Kelly explains how traditional benchmarks like MTEB fail to represent real-world query patterns and how embedding models that perform well on public benchmarks often underperform in production. The conversation explores the two-step process of Generative Benchmarking: filtering documents to focus on relevant content and generating queries that mimic actual user behavior. Kelly shares insights from applying this approach to Weights & Biases' technical support bot, revealing how domain-specific evaluation provides more accurate assessments of embedding model performance. We also discuss the importance of aligning LLM judges with human preferences, the impact of chunking strategies on retrieval effectiveness, and how production queries differ from benchmark queries in ambiguity and style. Throughout the episode, Kelly emphasizes the need for systematic evaluation approaches that go beyond "vibe checks" to help developers build more effective RAG applications. The complete show notes for this episode can be found at https://twimlai.com/go/728.
No persons identified in this episode.
This episode hasn't been transcribed yet
Help us prioritize this episode for transcription by upvoting it.
Popular episodes get transcribed faster
Other recent transcribed episodes
Transcribed and ready to explore now
NPR News: 12-08-2025 2AM EST
08 Dec 2025
NPR News Now
NPR News: 12-07-2025 11PM EST
08 Dec 2025
NPR News Now
NPR News: 12-07-2025 10PM EST
08 Dec 2025
NPR News Now
Meidas Health: AAP President Strongly Pushes Back on Hepatitis B Vaccine Changes
08 Dec 2025
The MeidasTouch Podcast
Democrat Bobby Cole Discusses Race for Texas Governor
07 Dec 2025
The MeidasTouch Podcast
Fox News Crashes Out on Air Over Trump’s Rapid Fall
07 Dec 2025
The MeidasTouch Podcast