This episode is sponsored by AGNTCY. Unlock agents at scale with an open Internet of Agents. Visit https://agntcy.org/ and add your support. How do the world's most powerful AI models get trained and trusted at scale, and what does that really take from data to deployment? In this episode, Appen CEO Ryan Kolln joins Eye on AI to unpack how rigorous human evaluation, culturally aware data, and model-based judges come together to raise real-world performance. In this episode of Eye on AI, host Craig Smith speaks with Ryan Kolln, CEO of Appen, about building evaluation systems that go beyond static benchmarks to measure usefulness, safety, and reliability in production. They explore how human raters and AI evaluators work in tandem, why localization matters across regions and domains, and how quality controls keep feedback signals trustworthy for training and post-training. Ryan explains how evaluation feeds reinforcement strategies, where rubric-driven human judgments inform reward models, and how enterprises can stand up secure workflows for sensitive use cases. He also discusses emerging needs around sovereign models, domain-specific testing, and the shift from general chat to agentic workflows that operate inside real business systems. Learn how leading teams design human-in-the-loop evaluation, when to route judgments from models back to expert reviewers, how to capture cultural nuance without losing universal guardrails, and how to build an evaluation stack that scales from early prototypes to production AI. Stay Updated: Craig Smith on X: https://x.com/craigss Eye on A.I. on X: https://x.com/EyeOn_AI
No persons identified in this episode.
This episode hasn't been transcribed yet
Help us prioritize this episode for transcription by upvoting it.
Popular episodes get transcribed faster
Other recent transcribed episodes
Transcribed and ready to explore now
Before the Crisis: How You and Your Relatives Can Prepare for Financial Caregiving
06 Dec 2025
Motley Fool Money
OpenAI's Code Red, Sacks vs New York Times, New Poverty Line?
06 Dec 2025
All-In with Chamath, Jason, Sacks & Friedberg
OpenAI's Code Red, Sacks vs New York Times, New Poverty Line?
06 Dec 2025
All-In with Chamath, Jason, Sacks & Friedberg
Anthropic Finds AI Answers with Interviewer
05 Dec 2025
The Daily AI Show
#2423 - John Cena
05 Dec 2025
The Joe Rogan Experience
Warehouse to wellness: Bob Mauch on modern pharmaceutical distribution
05 Dec 2025
McKinsey on Healthcare