Marc Brooker
๐ค SpeakerAppearances Over Time
Podcast Appearances
It's more powerful than ever.
Fantastic.
But where you really want to spend the time of the deep experts on your team is, here's something unexpected or unusual that's happened in the system.
Let's deeply understand that and let's bring that knowledge back to both improving that system and communicating broadly to the company and the outside community what we've learned from that.
And so one of the most powerful things we do at AWS is we have this mechanism of a very broad weekly meeting where we all get together, engineers from across AWS, leaders, senior leaders from across AWS, and talk about COEs, these postmortems that we write.
and what we can learn from them and how we can apply those lessons across the whole company.
And I think that particular mechanism, that particular kind of Wednesday morning meeting that we have is one of the things that has been a core, almost causal factor behind AWS's success.
because it has allowed us to and forced us to spend leadership bandwidth, to spend expertise, to spend the time of our best engineers deeply understanding how our systems operate and why they operate the way they do.
And that level of being just extremely grounded in reality
helps you design better products, helps you architect better systems, helps you think more clearly about the next round of things, helps you fix issues.
And so it's this fundamental kind of learning exercise.
It's a real blessing.
So I would recommend on call to anybody who wants to learn about the practice of distributed systems, and I would certainly recommend spending time reading COEs, reading postmortems, and deeply reflecting on not only what can we fix tactically, but what can we fix organizationally and strategically, and what kind of tools might need to exist to prevent this kind of thing happening again.
And you asked earlier about where do ideas come from?
This is another fantastic kind of flow of ideas of saying, wow, we seem to be solving this same problem over and over in different ways and getting it slightly wrong every time.
Can we extract a tool to do that?
Can we build a service around that?
Can we build a feature around that to make it easier for us to get right and easier for our customers to get right?
Yeah, and again, I think for me, it comes down to optimizing for finding the most important things to work on.
And if you aren't close to operating your actual system and you don't know how it's actually working,