How To Design And Build Machine Learning Systems For Reasonable Scale

Description

SummaryUsing machine learning in production requires a sophisticated set of cooperating technologies. A majority of resources that are available for understanding how to design and operate these platforms are focused on either simple examples that don’t scale, or over-engineered technologies designed for the massive scale of big tech companies. In this episode Jacopo Tagliabue shares his vision for "ML at reasonable scale" and how you can adopt these patterns for building your own platforms.AnnouncementsHello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery.Do you wish you could use artificial intelligence to drive your business the way Big Tech does, but don’t have a money printer? Graft is a cloud-native platform that aims to make the AI of the 1% accessible to the 99%. Wield the most advanced techniques for unlocking the value of data, including text, images, video, audio, and graphs. No machine learning skills required, no team to hire, and no infrastructure to build or maintain. For more information on Graft or to schedule a demo, visit themachinelearningpodcast.com/graft today and tell them Tobias sent you.Your host is Tobias Macey and today I’m interviewing Jacopo Tagliabue about building "reasonable scale" ML systemsInterviewIntroductionHow did you get involved in machine learning?How would you describe the current state of the ecosystem for ML practitioners? (e.g. tool selection, availability of information/tutorials, etc.) What are some of the notable changes that you have seen over the past 2 – 5 years?How have the evolutions in the data engineering space been reflected in/influenced the way that ML is being done?What are the challenges/points of friction that ML practitioners have to contend with when trying to get a model into production that isn’t just a toy?You wrote a set of tutorials and accompanying code about performing ML at "reasonable scale". What are you aiming to represent with that phrasing? There is a paradox of choice for any newcomer to ML. What are some of the key capabilities that practitioners should use in their decision rubric when designing a "reasonable scale" system?What are some of the common bottlenecks that crop up when moving from an initial test implementation to a scalable deployment that is serving customer traffic?How much of an impact does the type of ML problem being addressed have on the deployment and scalability elements of the system design? (e.g. NLP vs. computer vision vs. recommender system, etc.)What are some of the misleading pieces of advice that you have seen from "big tech" tutorials about how to do ML that are unnecessary when running at smaller scales?You also spend some time discussing the benefits of a "NoOps" approach to ML deployment. At what point do operations/infrastructure engineers need to get involved? What are the operational aspects of ML applications that infrastructure engineers working in product teams might be unprepared for?What are the most interesting, innovative, or unexpected system designs that you have seen for moderate scale MLOps?What are the most interesting, unexpected, or challenging lessons that you have learned while working on ML system design and implementation?What are the aspects of ML systems design that you are paying attention to in the current ecosystem?What advice do you have for additional references or research that ML practitioners would benefit from when designing their own production systems?Contact Infojacopotagliabue on GitHubWebsiteLinkedInParting QuestionFrom your perspective, what is the biggest barrier to adoption of machine learning today?Closing AnnouncementsThank you for listening! Don’t forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story.To help other people find the show please leave a review on iTunes and tell your friends and co-workersLinksThe Post-Modern Stack: ML At Reasonable ScaleCoveoNLP == Natural Language ProcessingRecListPart of speech taggingMarkov ModelYDNABB (You Don’t Need A Bigger Boat)dbtData Engineering Podcast EpisodeSeldonMetaflowPodcast.__init__ EpisodeSnowflakeInformation RetrievalModern Data StackSQLiteSpark SQLAWS AthenaKerasPyTorchLuigiAirflowFlaskAWS FargateAWS SagemakerRecommendations At Reasonable ScalePineconeData Engineering Podcast EpisodeRedisKNN == K-Nearest NeighborsPinterest Engineering BlogMaterializeOpenAIThe intro and outro music is from Hitman’s Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

Audio

Featured in this Episode

No persons identified in this episode.

Transcription

This episode hasn't been transcribed yet

Help us prioritize this episode for transcription by upvoting it.

0 upvotes

🗳️ Sign in to Upvote

Popular episodes get transcribed faster

Other recent transcribed episodes

Transcribed and ready to explore now

Eric Larsen on the emergence and potential of AI in healthcare

10 Dec 2025

McKinsey on Healthcare

Reducing Burnout and Boosting Revenue in ASCs

10 Dec 2025

Becker’s Healthcare -- Spine and Orthopedic Podcast

Dr. Erich G. Anderer, Chief of the Division of Neurosurgery and Surgical Director of Perioperative Services at NYU Langone Hospital–Brooklyn

09 Dec 2025

Becker’s Healthcare -- Spine and Orthopedic Podcast

Dr. Nolan Wessell, Assistant Professor and Well-being Co-Director, Department of Orthopedic Surgery, Division of Spine Surgery, University of Colorado School of Medicine

08 Dec 2025

Becker’s Healthcare -- Spine and Orthopedic Podcast

NPR News: 12-08-2025 2AM EST

08 Dec 2025

NPR News Now

NPR News: 12-08-2025 1AM EST

08 Dec 2025

NPR News Now

Comments

There are no comments yet.

Please log in to write the first comment.

AI Engineering Podcast

This episode hasn't been transcribed yet

Other recent transcribed episodes

Eric Larsen on the emergence and potential of AI in healthcare

Reducing Burnout and Boosting Revenue in ASCs

Dr. Erich G. Anderer, Chief of the Division of Neurosurgery and Surgical Director of Perioperative Services at NYU Langone Hospital–Brooklyn

Dr. Nolan Wessell, Assistant Professor and Well-being Co-Director, Department of Orthopedic Surgery, Division of Spine Surgery, University of Colorado School of Medicine

NPR News: 12-08-2025 2AM EST

NPR News: 12-08-2025 1AM EST

Login Required

Share this moment