Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Pricing
Podcast Image

Entra.Chat

Inside Entra Resilience: Microsoft's Outage War Stories, Backup Secrets and Preventing Global Outages

23 Aug 2025

Description

In this episode, I sit down with my boss, Tarek Dawoud, to pull back the curtain on what really happens during a major service outage. Tarek shares some incredible "war stories" from his time in the trenches, from the early days of DirSync where the team had to edit a sync file with a debugger to prevent an incident, to the massive outages of 2017 and 2018 that changed everything. We'll give you a peek into the high-stakes, quick-thinking world of a "live site" incident and reveal the groundbreaking engineering principles like cell-based architecture and the backup authentication service that were born from these challenges, making Entra more resilient than ever before. Subscribe with your favorite podcast player or watch on YouTube 👇About Tarek Dawoud Tarek Dawoud is a Lead Architect in the Customer Engineering team for Microsoft Entra. With years of experience growing up in Entra engineering, he has been involved in his share of outages and has a deep understanding of what it takes to build and maintain a resilient, hyperscale identity service. LinkedIn - https://www.linkedin.com/in/tarekdawoud/🔗 Related Links * SLA performance for Microsoft Entra ID - aka.ms/entraidsla * Microsoft Blames "Severe Weather" for Azure Cloud Outage * Microsoft Probes Cause of Global Web Outage* Microsoft's Azure AD authentication outage: What went wrong📗 Chapters00:57 What is a "Live Site"? 14:15 The Secret to Entra's Uptime: Cell-Based Architecture 18:09 How Entra Routes Your Login Request Globally 24:46 War Story #1: The 2017 Conditional Access Outage 29:52 War Story #2: How a Hurricane & an Office Bug Caused Chaos 43:39 The Backup Auth Service: Entra's Secret Weapon 57:54 Does the Backup Service Kick in Automatically? 01:04:16 Regional Isolation & The Power of Managed Identity 01:08:17 Anatomy of a Near-Outage in 2021 01:12:02 How Microsoft's Culture Learns From MistakesPodcast Apps🎙️ Entra.Chat - https://entra.chat🎧 Apple Podcast → https://entra.chat/apple📺 YouTube → https://entra.chat/youtube📺 Spotify → https://entra.chat/spotify🎧 Overcast → https://entra.chat/overcast🎧 Pocketcast → https://entra.chat/pocketcast🎧 Others → https://entra.chat/rssMerill's socials📺 YouTube → youtube.com/@merillx👔 LinkedIn → linkedin.com/in/merill🐤 Twitter → twitter.com/merill🕺 TikTok → tiktok.com/@merillf🦋 Bluesky → bsky.app/profile/merill.net🐘 Mastodon → infosec.exchange/@merill🧵 Threads → threads.net/@merillf🤖 GitHub → github.com/merill Get full access to Entra.News - Your weekly dose of Microsoft Entra at entra.news/subscribe

Audio
Featured in this Episode

No persons identified in this episode.

Transcription

This episode hasn't been transcribed yet

Help us prioritize this episode for transcription by upvoting it.

0 upvotes
🗳️ Sign in to Upvote

Popular episodes get transcribed faster

Comments

There are no comments yet.

Please log in to write the first comment.