As AI agents and multimodal models become more prevalent, understanding how to evaluate GenAI is no longer optional – it's essential. Generative AI introduces new complexities in assessment compared to traditional software, and this week on Chain of Thought we’re joined by Chip Huyen (Storyteller, Tép Studio), Vivienne Zhang (Senior Product Manager, Generative AI Software, Nvidia) for a discussion on AI evaluation best practices. Before we hear from our guests, Vikram Chatterji (CEO, Galileo) and Conor Bronsdon (Developer Awareness, Galileo) give their takes on the complexities of AI evals and how to overcome them through the use of objective criteria in evaluating open-ended tasks, the role of hallucinations in AI models, and the importance of human-in-the-loop systems. Afterwards, Chip and Vivienne sit down with Atin Sanyal (Co-Founder & CTO, Galileo) to explore common evaluation approaches, best practices for building frameworks, and implementation lessons. They also discuss the nuances of evaluating AI coding assistants and agentic systems. Chapters: 00:00 Challenges in Evaluating Generative AI 05:45 Evaluating AI Agents 13:08 Are Hallucinations Bad? 17:12 Human in the Loop Systems 20:49 Panel discussion begins 22:57 Challenges in Evaluating Intelligent Systems 24:37 User Feedback and Iterative Improvement 26:47 Post-Deployment Evaluations and Common Mistakes 28:52 Hallucinations in AI: Definitions and Challenges 34:17 Evaluating AI Coding Assistants 38:15 Agentic Systems: Use Cases and Evaluations 43:00 Trends in AI Models and Hardware 45:42 Future of AI in Enterprises 47:16 Conclusion and Final Thoughts Follow: Vikram Chatterji: https://www.linkedin.com/in/vikram-chatterji/ Atin Sanyal: https://www.linkedin.com/in/atinsanyal/ Conor Bronsdon: https://www.linkedin.com/in/conorbronsdon/ Chip Huyen: https://www.linkedin.com/in/chiphuyen/ Vivienne Zhang: https://www.linkedin.com/in/viviennejiaozhang/ Show notes: Watch all of Productionize 2.0: https://www.galileo.ai/genai-productionize-2-0
No persons identified in this episode.
This episode hasn't been transcribed yet
Help us prioritize this episode for transcription by upvoting it.
Popular episodes get transcribed faster
Other recent transcribed episodes
Transcribed and ready to explore now
3ª PARTE | 17 DIC 2025 | EL PARTIDAZO DE COPE
01 Jan 1970
El Partidazo de COPE
Buchladen: Tipps für Weihnachten
20 Dec 2025
eat.READ.sleep. Bücher für dich
BOJ alza 25pb decennale sopra 2%, Oracle vola con accordo Tik Tok, 90 mld eurobond per Ucraina | Morning Finance
19 Dec 2025
Black Box - La scatola nera della finanza
365. The BEST advice for managing ADHD in your 20s ft. Chris Wang
19 Dec 2025
The Psychology of your 20s
LVST 19 de diciembre de 2025
19 Dec 2025
La Venganza Será Terrible (oficial)
Cuando la Ciencia Ficción Explicó el Mundo que Hoy Vivimos
19 Dec 2025
El Podcast de Marc Vidal