Incident Status: On Hold w/special guest Will Gallego
28 Nov 2025
Contributed by Lukas
Mentioned multiple times, Em Ruppe’s amazing talk on incident severity: https://www.usenix.org/con...
Complex Systems and the Messy Nine w/special guests Dave Woods and John Allspaw
13 Nov 2025
Contributed by Lukas
The writeup on the AWS outage from AWS themselves, if you haven’t seen it: https://aws.amazon.com/...
All the things about Incident Command
30 Oct 2025
Contributed by Lukas
It’s Spamton G (not J) Spamton, Clint! Get hip to the game characters! https://deltarune.fand...
Root Cause Analysis vs. Resilience Engineering w/special guest Lorin Hochstein
16 Oct 2025
Contributed by Lukas
A history of the 5 whys and root cause analysis from papersSome critiques of the 5 whys:From John Al...
First Stories/Second Stories
02 Oct 2025
Contributed by Lukas
More robustness than resilience, but worth repeating that you should always check your earthquake go...
How (Not) to Introduce Resilience Engineering at Work with special guest Michelle Casey
18 Sep 2025
Contributed by Lukas
Lorikeets are pretty: https://en.wikipedia.org/wiki/Rainbow_lorikeetYou think Colette’s kidding ab...
How long should you wait after an incident to do your retro?
25 Jul 2025
Contributed by Lukas
Corn sweat is a real thing: https://www.scientificamerican.com/article/humidity-from-corn-sweat-inte...
Lund University - Academic Theory and Practice
10 Jul 2025
Contributed by Lukas
A huge thanks to our panelists:John AllspawJed NeedleChad ToddRISF and TiF will ho...
What’s the ROI on Reliability and Resilience work?
27 Jun 2025
Contributed by Lukas
Dave Wood’s Talk at SRECon 25 was on Complexification and SRE: https://www.youtube.com/watch?v=lmB...
Runbooks: the Good, Bad and Ugly w/special guest Andrew Hatch
03 Jun 2025
Contributed by Lukas
You can register for the After-the-Episode chat with Andrew at https://resilienceinsoftware.org/netw...
What is an incident? How come no one declare them?
21 May 2025
Contributed by Lukas
Michael Wettick’s Lund thesis is great, and Laura Maguire’s paper on the Costs of Coordination t...
Chaos Engineering w/special guest Casey Rosenthal
07 May 2025
Contributed by Lukas
The O’Reilly book on Chaos Engineering by Casey and Nora Jones is here: https://www.oreilly.com/li...
Burnout on Aisle 3
26 Apr 2025
Contributed by Lukas
Clint wrote the Socio-Technical Reality Engineer as a blog post it’s a good read.The Burnout book ...
Resilience, Complexity, and Your Boss a collab w/Punk Rock Safety
09 Apr 2025
Contributed by Lukas
Ben (Goodheart), Dave (Provan) and Ron (Gantt) have the very awesome podcast Punk Rock Safety (punkr...
Live From SRECon
28 Mar 2025
Contributed by Lukas
No video for this one because it didn’t really end up working.We had some awesome people with us f...
Teaser Episode - Season 2
12 Mar 2025
Contributed by Lukas
The XKCD comic that’s in Colette’s thesis is DependencyJustin Reock is at DXhttps://punkrocksafe...
Episode 10 - When They go Full ITIL on You w/special guest john allspaw
20 Feb 2025
Contributed by Lukas
You can find John at Adaptive Capacity Labs or his (old) blog at Kitchen Soap. ITIL is… well,...
Episode 9 - Learning from Incidents with special guest Alex Elman
12 Feb 2025
Contributed by Lukas
You can find ACL (Adaptive Capacity Labs), the folks who train software engineers how to do LFI and ...
Episode 8 - Why Human Factors and Not Technical Ones
29 Jan 2025
Contributed by Lukas
The spicy Allspaw take that inspired our listener is here: https://www.linkedin.com/posts/jallspaw_a...
Episode 7 - AI and Resilience with special guest Courtney Nash
22 Jan 2025
Contributed by Lukas
The VOID is one of our favorite things!Some of Courtney’s inoculation of the MTTR virus can be ...
Episode 6 - Can You Buy Resilience? With Special Guest Steve McGhee
08 Jan 2025
Contributed by Lukas
Steve is the host of the Google SRE Prodcast, you should check it out!Colette got her chickens from ...
episode 5 - curating your resilience engineering 101
22 Dec 2024
Contributed by Lukas
We talk about our favorite recommendations for someone who's just getting into this whole resilience...
Episode 4 - A look at the 2024 dora report
11 Dec 2024
Contributed by Lukas
Fred’s wonderful blogThis year’s DORA reportLee, Ramsey & Hicks on productivity and performa...
Episode 3 - lions, tigers and metrics, oh my!
04 Dec 2024
Contributed by Lukas
We answered a set of questions about how to deal with dashboards and MTTR and how to make the best o...
Episode 2 - Does Software Need Safety?
21 Nov 2024
Contributed by Lukas
We talk to the pioneer of resilience engineering in the software world John Allspaw about how he dis...
Episode 1 - Every Second Counts
07 Nov 2024
Contributed by Lukas
The introduction episode of This is Fine! A podcast about resilience engineering in the software wor...