Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing
Podcast Image

Reliability Enablers

Technology Business

Activity Overview

Episode publication activity over the past year

Episodes

You (and AI) can't automate reliability away

02 Dec 2025

Contributed by Lukas

What if the hardest part of reliability has nothing to do with tooling or automation? Jennifer Petoff explains why real reliability comes from the hum...

#67 Why the SRE Book Fails Most Orgs — Lessons from a Google Veteran

15 Jul 2025

Contributed by Lukas

A new or growing SRE team. A copy of the book. A company that says it cares about reliability. What happens next? Usually… not much.In this episode,...

#66 - Unpacking 2025 SRE Report’s Damning Findings

01 Jul 2025

Contributed by Lukas

I know it’s already six months into 2025, but we recorded this almost three months ago. I’ve been busy with my foray into the world of tech consul...

#65 - In Critical Systems, 99.9% Isn’t Reliable — It’s a Liability

17 Jun 2025

Contributed by Lukas

Most teams talk about reliability with a margin for error. “What’s our SLO? What’s our budget for failure?” But in the energy sector? There is...

#64 - Using AI to Reduce Observability Costs

28 Jan 2025

Contributed by Lukas

Exploring how to manage observability tool sprawl, reduce costs, and leverage AI to make smarter, data-driven decisions.It's been a hot minute since t...

#63 - Does "Big Observability" Neglect Mobile?

12 Nov 2024

Contributed by Lukas

Andrew Tunall is a product engineering leader focused on pushing the boundaries of reliability with a current focus on mobile observability. Using his...

#62 - Early Youtube SRE shares Modern Reliability Strategy

05 Nov 2024

Contributed by Lukas

Andrew Fong’s take on engineering cuts through the usual role labels, urging teams to start with the problem they’re solving instead of locking in...

#61 Scott Moore on SRE, Performance Engineering, and More

22 Oct 2024

Contributed by Lukas

This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit read.srepath.com

#60 How to NOT fail in Platform Engineering

01 Oct 2024

Contributed by Lukas

Here’s what we covered:Defining Platform Engineering* Platform engineering: Building compelling internal products to help teams reuse capabilities w...

#59 Who handles monitoring in your team and how?

24 Sep 2024

Contributed by Lukas

Why many copy Google’s monitoring team setup* Google’s Influence. Google played a key role in defining the concept of software reliability.* Succe...

#58 Fixing Monitoring's Bad Signal-to-Noise Ratio

17 Sep 2024

Contributed by Lukas

Monitoring in the software engineering world continues to grapple with poor signal-to-noise ratios. It’s a challenge that’s been around since the ...

#57 How Technical Leads Support Software Reliability

10 Sep 2024

Contributed by Lukas

The question then condenses down to: Can technical leads support reliability work? Yes, they can! Anemari has been a technical lead for years — even...

#56 Resolving DORA Metrics Mistakes

04 Sep 2024

Contributed by Lukas

We're already well into 2024 and it’s sad that people still have enough fuel to complain about various aspects of their engineering life. DORA seem...

#55 3 Uses for Monitoring Data Other Than Alerts and Dashboards

27 Aug 2024

Contributed by Lukas

We’ll explore 3 use cases for monitoring data. They are:* Analyzing long-term trends* Comparing over time or experiment groups* Conducting ad hoc re...

#54 Becoming a Valuable Engineer Without Sacrificing Your Sanity

20 Aug 2024

Contributed by Lukas

Shlomo Bielak is the Head of Engineering (Operational Excellence and Cloud) at Penn Interactive, an interactive gaming company. He’s dedicated much ...

#53 What's Missing in Incident Response Processes?

15 Aug 2024

Contributed by Lukas

Incident response is an increasingly difficult area for organizations. Many teams end up paying a lot of money for incident management solutions. Howe...

Can ITIL Benefit from Site Reliability Engineering?

13 Aug 2024

Contributed by Lukas

According to Vlad Ukis, there are a lot of enterprises around whose IT functions are organized around ITIL. What you use SRE for is something complete...

#52 Navigating Complexity within Incidents

06 Aug 2024

Contributed by Lukas

Sonja Blignaut is a complexity expert. That might not sound relevant to incident response in reliability engineering. But it is!Our systems are becomi...

#51 Whitebox vs Blackbox Monitoring

30 Jul 2024

Contributed by Lukas

Have you got complete monitoring of your software in effect? Are you sure? Google's SREs break monitoring down to white box versus black box monitorin...

#50 Making Better Sense of Observability Data

09 Jul 2024

Contributed by Lukas

Jack Neely is a DevOps observability architect at Palo Alto Networks and has a few interesting ways of extracting value from o11y data.We crammed into...

#49 Alert Fatigue is Still an Issue - Here's How We Fix it

02 Jul 2024

Contributed by Lukas

Alert noise is no joke and neither is the fatigue that results from it. I spoke with Dan Ravenstone who gave a talk at Monitorama about this very topi...

#48 Cutting Down "Toil" aka Manual Work in Software

25 Jun 2024

Contributed by Lukas

Sebastian and I scoured Chapter 5 of the Site Reliability Engineering (2016) book to find nuggets of wisdom on how to reduce toil.We hit the jackpot w...

#47 How to Grow Team Impact Through Learning Culture

18 Jun 2024

Contributed by Lukas

The common refrain after an incident is “We could and should learn from this”. To me, that alludes to the need for a robust learning culture.We mi...

#46 Platform Team Design According to Team Team Topologies

11 Jun 2024

Contributed by Lukas

I continue my conversation with Manuel Pais, co-author of the seminal Team Topologies book about team topologies suitable for reliability teams.In thi...

#45 How Team Topologies Can Guide Enabling Teams

04 Jun 2024

Contributed by Lukas

I got the inside word from Manuel Pais, co-author of the seminal Team Topologies book, to explain in a 2-part series about 2 of the most relevant team...

#44 - Making SLOs Matter to Stakeholders

30 May 2024

Contributed by Lukas

Bonus episode on SLOs because Sebastian and I felt that we did not cover the why of SLOs and make them relevant to stakeholders. This is a public epis...

#43 - SLOs: a Deeper Dive into its Mechanics

28 May 2024

Contributed by Lukas

This episode continues our coverage of Chapter 4 of the Site Reliability Engineering book (2016). In this second part, we take a deeper dive into the ...

#42 - Hitting Software SLA Targets through SLOs and SLIs

21 May 2024

Contributed by Lukas

In this first part of a 2-part coverage, Sebastian Vietz and I work out how to meet SLAs through SLOs and SLIs. This episode covers Chapter 4 of the S...

#41 Curbing High Observability Costs

14 May 2024

Contributed by Lukas

No one wants to get Coinbase’s $65 million observability bill in the future. Sure, observability comes with a necessary cost. But that cost cannot e...

#40 How to Enable Observability for Success

07 May 2024

Contributed by Lukas

Observability is more than a set of technologies. It’s a practice. Timothy Mahoney is no stranger to this practice, enabling many developer teams to...

#39 How Chaos Engineering Helps Reduce Incident Risk

30 Apr 2024

Contributed by Lukas

Chaos Engineering is no longer a nice to have, as Ananth Movva explains in this episode of the SREpath podcast. His experiences with it drove a reduce...

#38 The Real Cost of Software Reliability & Downtime

23 Apr 2024

Contributed by Lukas

This episode covers Chapter 3 of the Site Reliability Engineering book (2016). In this second part, we talk about the costs behind reliability and cho...

#37 An SRE Approach to Managing Technology Risk

16 Apr 2024

Contributed by Lukas

This episode covers Chapter 3 of the Site Reliability Engineering book (2016). In this first part, we talk about embracing risk from the SRE perspecti...

#36 Avoiding Critical Platform Engineering Mistakes

09 Apr 2024

Contributed by Lukas

Platform engineering is replacing SRE and DevOps. Jokes aside, knowing the path to better platforms is key. Abby Bangser is the right person to tell u...

#35 Boosting Your Observability Data's Usability

02 Apr 2024

Contributed by Lukas

The observability (o11y) data revolution is well underway, but are we getting the most from the data that is being collected?Richard Benwell thinks we...

#34 From Cloud to Concrete: Should You Return to On-Prem?

26 Mar 2024

Contributed by Lukas

This episode continues our coverage of Chapter 2 of the Site Reliability Engineering book (2016). We talk about the age-old debate of cloud vs on-prem...

#33 Inside Google's Data Center Design

19 Mar 2024

Contributed by Lukas

This episode covers Chapter 2 of the Site Reliability Engineering book (2016). In this first part, we talk about the intricacies of data center design...

#32 Clarifying Platform Engineering's Role (with Ajay Chankramath) BONUS EP

14 Mar 2024

Contributed by Lukas

Will Platform Engineering replace DevOps or SRE or both? I don’t think this is the case at all. Neither does Ajay Chankramath.He is the Head of Plat...

#31 Introduction to FinOps (with Ajay Chankramath)

12 Mar 2024

Contributed by Lukas

FinOps is on the tip of many tongues in the software space right now, as we try to curb our cloud costs. Ajay Chankramath has given talks on FinOps at...

#30 Clearing Delusions in Observability (with David Caudill)

07 Mar 2024

Contributed by Lukas

Observability is going through interesting times. David Caudill believes that delusions are getting in the way of our success in this area. He's a...

#29 - Reacting to Google's SRE book 2016 (Chapter 1 Part 2)

27 Feb 2024

Contributed by Lukas

Sebastian and I continue our breakdown of notable passages from Chapter 1 of Google's Site Reliability Engineering (2016) book by Betsy Beyer, Jen...

#28 - Reacting to Google's SRE Book 2016 (Chapter 1 Part 1)

20 Feb 2024

Contributed by Lukas

Sebastian and I got together to react to and discuss 5 passages from Chapter 1 of Google's Site Reliability Engineering book (2016) by Betsy Beyer...

#27 - Growing as a Site Reliability Engineer (Part 3)

13 Feb 2024

Contributed by Lukas

Third and final instalment of the Growing as an SRE series covering practical ideas for planning your career progression This is a public episode. If ...

#26 - Growing as a Site Reliability Engineer (Part 2)

08 Feb 2024

Contributed by Lukas

In part 1, we covered the first truth - that you don't grow in your career merely through tenure. That was a simple one.  Let's explore 2 mor...

#25 - DORA and the Pursuit of Engineering Excellence (with Tim Wheeler)

30 Jan 2024

Contributed by Lukas

DORA metrics are a hot topic among technology executives in all kinds of enterprise. But there's more to engineering culture than solely relying o...

#24 - Growing as a Site Reliability Engineer (Part 1)

23 Jan 2024

Contributed by Lukas

How can you grow as an SRE? You've probably thought about your career progression at some point. Ash put together his initial thoughts on this top...

#23 - The Danger of Unreliable Platforms (with Jade Rubick)

16 Jan 2024

Contributed by Lukas

Jade Rubick needs no introduction in the reliability and observability space. He was VP of Engineering at New Relic from 2010 to 2019. It was my pleas...

#22 - How Google does SRE Consulting (with Yury Niño Roa)

09 Jan 2024

Contributed by Lukas

I did not know that Google itself does consulting around its SRE practices. This is not a sponsored episode LOL! I wanted to talk with my SRE friend, ...

#21 - Better SRE in 2024 is all we can hope for

02 Jan 2024

Contributed by Lukas

Sebastian is back for this episode to help set out direction for 2024. We reflected during the holidays on the problems SREs faced in 2023 in terms of...

#20 Holiday Special with Stephen Townshend

19 Dec 2023

Contributed by Lukas

Join Ash Patel and Stephen Townshend for a friendly chat about what they've learned in SRE as 2023 comes toward a wrap! This is a public episode. ...

#19 How to Develop Early Career Engineers (with John Hyland)

12 Dec 2023

Contributed by Lukas

Ash Patel talks with John Hyland who ran the Ignite Program at New Relic, which is dedicated to developing early career engineers.John shares insights...

#18 Winning at SRE in Banking and Telecom (with Troy Koss)

05 Dec 2023

Contributed by Lukas

Ash Patel talks with Troy Koss who is the Director of SRE at CapitalOne, an early adopter of DevOps and SRE in the banking sector. He shares insights ...

#17 Lessons from SRE's Wild West Days (with Rick Boone)

27 Nov 2023

Contributed by Lukas

Ash Patel talks with Rick Boone who is a pioneer in SRE, having been an early AppOps engineer at Facebook and Uber's first SRE hire. He shares ama...

#16 Acing Cloud Infra in Digital Media Giant (with Sreejith Chelanchery)

21 Nov 2023

Contributed by Lukas

Ash Patel interviews Sreejith Chelchery who is SVP of Delivery and Infrastructure Engineering at Dotdash Meredith. Sreejith shares his journey from pr...

#15 Growing Reliability Engineering Across 5+ Companies (with Nash Seshan)

14 Nov 2023

Contributed by Lukas

Ash Patel talks with Nash Seshan, who has supported reliability work in over 5 organizations, including Cisco, eBay, Dropbox, Lyft, Netflix, and Wayfa...

#14 Faster Incident Resolution through Data-Driven Notebooks (with Ivan Merrill)

07 Nov 2023

Contributed by Lukas

Ash Patel talks with Ivan Merrill of Fiberplane about wrangling the big data that incidents and systems generate through collaborative notebooks. Ivan...

#13 Making Sense of OpenTelemetry and Observability (with Adriana Villela)

31 Oct 2023

Contributed by Lukas

Ash Patel talks with Adriana Villela (CNCF Ambassador, OpenTelemetry contributor, and senior developer advocate at Lightstep) about the promise of Ope...

#12 From Incident Firefighting to Reliability First (with Robert Ross)

24 Oct 2023

Contributed by Lukas

Ash Patel talks with Robert Ross of Firehydrant about his experience in offering incident management software to SREs and other software incident resp...

#11 Rising to Staff Engineer in DevOps and SRE (with Rajesh Reddy N)

17 Oct 2023

Contributed by Lukas

Ash Patel interviews Rajesh Reddy N about his experiences as a senior DevOps and SRE individual contributor. Rajesh shares his insights on having syst...

#10 Using AI for Kubernetes troubleshooting self-service (with Kyle Forster)

10 Oct 2023

Contributed by Lukas

Ash Patel interviews Kyle Forster of RunWhen about his experiences as an ex-Google director helping SREs and running an AI-based company that supports...

#9 Inside Booking.com's Site Reliability Engineering practice (with Samuele Tonon and Yoann Fouquet)

02 Oct 2023

Contributed by Lukas

In this episode of the SREpath Podcast, Ash Patel interviews two SRE managers from Booking.com, Samuele and Yoann, to gain insights into their experie...

#8 Software Reliability Ninja Who is NOT an SRE (with Pablo Bouzada)

11 Sep 2023

Contributed by Lukas

Ash Patel interviews Pablo Bouzada about his beliefs on software reliability as a non-SRE leader. They discuss the importance of effective leadership ...

What happened to the podcast?

05 Sep 2023

Contributed by Lukas

We haven't hit hard times, just doing other things for the last 2 months including making plans for more interesting episodes on this podcast! Thi...

#7 Bringing HR onboard with SRE hiring and onboarding

13 Jul 2023

Contributed by Lukas

In this episode, we highlight the importance of engaging with HR partners to establish an effective understanding of the SRE career model. This will a...

#6 Building a successful SRE practice through capabilities

29 Jun 2023

Contributed by Lukas

We discuss the need for a framework to guide the development of Site Reliability Engineers (SREs) and drive value for organizations. You will learn ab...

#5 Where does SRE fit into your organization's structure?

15 Jun 2023

Contributed by Lukas

We discuss throughout this episode the different engagement models for Site Reliability Engineering (SRE) and how to contextualize SRE into an organiz...

#4 Should organizations care about SRE?

01 Jun 2023

Contributed by Lukas

This episode discusses how Site Reliability Engineering (SRE) can be important to organizations. SRE can optimize software operations, reduce costs, s...

#3 SRE vs DevOps vs Platform Engineering

17 May 2023

Contributed by Lukas

In this episode of SREpath, Ash and Sebastian discuss the unnecessary debate surrounding Site Reliability Engineering (SRE), DevOps, and platform engi...

#2 What is Site Reliability Engineering (SRE) and what is not SRE?

04 May 2023

Contributed by Lukas

In this episode of the SREpath podcast, Ash and Sebastian explore what Site Reliability Engineering (SRE) is and how it manifests in a highly function...

#1 Introducing the SREpath podcast

20 Apr 2023

Contributed by Lukas

Welcome to the first episode of the SREpath podcast! In this episode, we'll introduce you to our podcast hosts and give you their broad-level view...