Cloud Native Testing Podcast
How Cloudflare Tests in Production for Unbreakable Performance
28 Jul 2025
In this illuminating episode, we pull back the curtain on Cloudflare's intricate approach to testing at a truly global scale. Join us as Sachin from Cloudflare's team shares invaluable insights into how they leverage their vast, distributed network to push the boundaries of reliability and performance. We'll explore the critical role of SLOs (Service Level Objectives) and SLIs (Service Level Indicators) in measuring Cloudflare's service reliability, and why error budgeting is essential for managing risk across their massive infrastructure.Discover how Cloudflare embraces the often-debated, yet crucial, practice of testing in production to validate real-world performance. Learn how observability becomes their indispensable guide to understanding complex system behavior under immense load. We'll also discuss the power of automated monitoring in catching anomalies, the importance of collaborating with SLO teams within Cloudflare, and how continuous improvement is fueled by accurate metrics and data analysis gleaned from their unique testing environment. Whether you're a DevOps engineer, SRE, or simply curious about how an internet giant ensures its services stand strong, this episode offers actionable strategies and a fascinating look into Cloudflare's blueprint for resilience.
No persons identified in this episode.
No transcription available yet
Help us prioritize this episode for transcription by upvoting it.
Popular episodes get transcribed faster