Menu
Sign In Search Podcasts Libraries Charts People & Topics Add Podcast API Blog Pricing
Podcast Image

The Peterman Pod

AWS Distinguished Eng: Learning From 3000 Incidents And How Engineering Is Changing | Marc Brooker

13 Apr 2026

Transcription

Chapter 1: What is the main topic discussed in this episode?

0.031 - 5.803 Marc Brooker

If you aren't doing it hands-on, your opinion about it is very likely to be completely wrong.

0

6.484 - 18.228 Ryan Peterman

This is Mark Brooker. He's a distinguished engineer at AWS, and I interviewed him for technical learnings from his career. 3,000 cloud system postmortems. I wanted to ask you, what makes a good postmortem?

0

18.589 - 20.453 Marc Brooker

I could spend a lot of time talking about that.

0

Chapter 2: What insights can we gain from 3,000 cloud system postmortems?

20.787 - 41.408 Ryan Peterman

You had a tweet that said that there are cases where caches are bad. I prefer to see the teams around me avoiding caching where possible. We also discussed how software engineering is changing. What is important given that code is kind of flowing like water now? The job changes and you do different work.

0

42.289 - 72.093 Ryan Peterman

For someone who's structuring their career, would you say it's better to be overrated or underrated? Here's the full episode. At some point when I was a very junior engineer, I looked at the more senior engineers. So what is the difference between you and I? I'm working more hours than you. I'm landing more code than you. Why is it that you're so much more impactful than I am?

0

72.153 - 85.945 Ryan Peterman

And then I realized that kind of the direction of your work, like what is the thing that you're actually shipping matters more than the volume of your work and your contributions. What would be your advice on how do you find problems that matter?

0

Chapter 3: Why might caching be detrimental in software engineering?

87.325 - 108.216 Marc Brooker

Yeah, I think you have to go super broad. So I think there's a set of those things that come in from customers, from the world, right? Like here is an unsolved problem. I spend a lot of time meeting with AWS customers and listening to them talk about what are the things they still find difficult in our space? What are they investing in? Where are they spending their time?

0

108.296 - 129.682 Marc Brooker

Where would they prefer to be not spending their time and focus on their core business instead. And so that's one rich seam of ideas and focus on what's interesting. I think completely at the other level is sort of on looking at the technical trends and you can look at just the kind of speeds and feeds like, wow, networks have gotten faster.

0

Chapter 4: How is AI transforming the landscape of software engineering?

129.703 - 154.319 Marc Brooker

Storage has gotten faster. We've seen this huge explosion in multi-core and now in GPUs. And so there's a... bottom-up innovation trend there too, which you can also look at and say, well, this enables all of these new things. And And then broadly kind of across the world, like what are the big trends that are going on?

0

Chapter 5: What advice does Marc Brooker have for junior engineers in the age of AI?

154.419 - 177.711 Marc Brooker

What are the things that are changing in our industry? What are the things that are changing in the world? And really it is those kind of moments of change that bring with them the opportunity to build things and to recognize problems. And so to pick one concretely, when I was working in the Lambda team in 2020, and I

0

178.079 - 191.919 Marc Brooker

I was talking to a lot of customers about, you know, they were super excited about building on serverless. They were super excited about building on containers. There had been this massive shift and what people were seeing then was, wow, I love these serverless products.

0

191.999 - 199.87 Marc Brooker

I love building this way, but the world of data and especially relational data doesn't fit super well into this paradigm, right?

0

Chapter 6: What considerations should senior engineers make regarding their impact?

199.89 - 223.327 Marc Brooker

These relational databases are still very serverful, you know, fantastically powerful products, but not kind of operationally the same. And, you know, that thinking was, you know, just felt super important to me of like, wow, these customers have brought to me a gift of understanding something that's really important. And so I joined the Aurora team. We built Aurora serverless.

0

223.367 - 225.511 Marc Brooker

And then we built the SQL.

0

Chapter 7: Why is writing important for engineers, according to Marc Brooker?

225.992 - 255.388 Marc Brooker

You know, we've been investing deeply across all of our database products to make them a better fit for these customers. serverless and container workloads. And That is an example of a trend that was brought by a customer. But then also these trends that have been driven by kind of architecture or by other things going on, right? Faster networks, faster compute, faster connectivity.

0

255.908 - 279.805 Marc Brooker

And so one of the big technical trends in the database world right now is this sort of block storage becoming the default backend, the default durability layer for databases of all kinds, from analytics workloads to online workloads. And there's been this incredible explosion around that. And so if you look at what we did with Aurora D-SQL, for example,

0

280.038 - 290.063 Marc Brooker

That was very much learning from that trend and taking a lead in that trend and saying, well, we're going to make S3, this block store that we built 20 years ago.

0

Chapter 8: What technical book recommendations does Marc suggest for engineers?

290.084 - 316.356 Marc Brooker

Sorry, object store that we built 20 years ago. The underlying durability layer of this new database. But obviously it doesn't have the latency properties or the rich interface that an online database needs. And so we're going to build an architecture on top of that that deals with all of these other things in a much better way, but doesn't have to worry about durability.

0

316.336 - 337.686 Marc Brooker

And, you know, so that was this perfect collision of a set of things I was hearing from customers and a set of things that were technical trends coming together and thinking, wow, we've got this opportunity to build something now that is going to be a market leading product that would be hard to imagine without either of those input signals.

0

338.881 - 358.323 Ryan Peterman

I saw something that you wrote. You mentioned that you were on call for 15 years somewhere in there. And I've heard many stories of more senior engineers negotiating out of on-call because per unit time, it could be perceived as not that impactful. And so why did you stay on call for so long?

0

359.364 - 378.068 Marc Brooker

I would say that the majority of engineers my in practice knowledge about how to build distributed systems has come from being on call and analyzing and deeply understanding these postmortems and CREs.

0

378.318 - 404.713 Marc Brooker

One of the challenges of running a company like AWS and running large-scale systems is that folks come out of college with often great knowledge of computer science fundamentals, great programming skills, great mathematical skills. All of that stuff is fantastic, but without the grounded knowledge of what it actually means to run and understand systems.

405.166 - 431.616 Marc Brooker

And on-call is one of the best ways to learn those things, best ways to see how do systems really run? How do they really behave? How do customers really use them? What happens when customers use systems in unexpected ways? How can we make systems more resilient to customers using them in different ways? And I think that should be almost a goal of on-call, right?

431.656 - 445.618 Marc Brooker

If you have folks in your teams who are on-call and they're just closing the same ticket over and over and over, well, you know, that's where you need to just build some automation. And again, building automation is easier than ever. It's more powerful than ever. Fantastic.

445.784 - 472.551 Marc Brooker

But where you really want to spend the time of the deep experts on your team is, here's something unexpected or unusual that's happened in the system. Let's deeply understand that and let's bring that knowledge back to both improving that system and communicating broadly to the company and the outside community what we've learned from that.

473.948 - 502.117 Marc Brooker

And so one of the most powerful things we do at AWS is we have this mechanism of a very broad weekly meeting where we all get together, engineers from across AWS, leaders, senior leaders from across AWS, and talk about COEs, these postmortems that we write. and what we can learn from them and how we can apply those lessons across the whole company.

Comments

There are no comments yet.

Please log in to write the first comment.