Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Pricing
Podcast Image

Two Voice Devs

Episode 227 - LLM Evaluation: Choosing the RIGHT Model

14 Feb 2025

Description

Are you overwhelmed by the sheer number of Large Language Models (LLMs) available? Choosing the right LLM for your project isn't about picking the most popular one – it's about understanding your specific needs and rigorously evaluating your options.In this episode of Two Voice Devs, Allen Firstenberg and guest host Brad Nemer, a seasoned product manager, dive deep into the world of LLM evaluation. They go beyond the marketing buzz and explore practical tools and strategies for making informed decisions.Whether you're a developer, a product manager, or just curious about the practical applications of LLMs, this episode provides invaluable insights into making the right choices for your projects. Don't get caught up in the hype – learn how to evaluate LLMs effectively!More Info:https://www.udacity.com/blog/2025/01/how-to-choose-the-right-ai-model-for-your-product.html[00:00:00] Introduction: Meet Brad Niemer[00:00:38] Brad's Journey to Product Management & AI[00:03:12] Collaboration with Noble Ackerson and the LLM Evaluation Challenge[00:05:23] The Role of a Product Manager.[00:07:43] Product manager relation to engineering.[00:13:46] Exploring Evaluation Tools: Hugging Face[00:16:58] Exploring Evaluation Tools: Chatbot Arena (Human Evaluation)[00:20:30] Chatbot Arena: Code Generation Evaluation[00:24:43] Evaluating LLMs: Beyond Chatbots and Truth[00:26:11] Exploring Evaluation Tools: Artificial Analysis (Quality, Speed, Price)[00:28:47] Exploring Evaluation Tools: Galileo (Hallucination Report)[00:31:16] Case Study: DeepSeek and the Importance of Contextual Evaluation[00:34:53] The Future of LLM Testing and Quality Assurance[00:37:49] Wrap Up contact information.#LLM #LargeLanguageModels #AIEvaluation #ProductManagement #TechTalk #TwoVoiceDevs #HuggingFace #GenAI #GenerativeAI #ChatbotArena #ArtificialAnalysis #Galileo #DeepSeek #ChatGPT #Gemini #Mistral #Claude #ModelSelection #AIdevelopment #SoftwareDevelopment #Testing #QA #RAG #MachineLearning #NLP #Coding #TechPodcast #YouTubeTech #Developers

Audio
Featured in this Episode

No persons identified in this episode.

Transcription

This episode hasn't been transcribed yet

Help us prioritize this episode for transcription by upvoting it.

0 upvotes
🗳️ Sign in to Upvote

Popular episodes get transcribed faster

Comments

There are no comments yet.

Please log in to write the first comment.