AWS Distinguished Eng: Learning From 3000 Incidents And How Engineering Is Changing | Marc Brooker
I think automation should be doing those kinds of work, but I think there's fantastic value in deep understanding, deep investigations, and deep reflection on what you learn from postmortems and COEs.
AWS Distinguished Eng: Learning From 3000 Incidents And How Engineering Is Changing | Marc Brooker
So I think what makes a really great post-mortem is first really getting into the details and making sure that you deeply understand what happened rather than just assuming what happened based on the biases you bring in.
AWS Distinguished Eng: Learning From 3000 Incidents And How Engineering Is Changing | Marc Brooker
And so there's a kind of lesson one there is if you can't understand what happened, well, that teaches you something about your logging and metrics and observability and simulations and all of these other things.
AWS Distinguished Eng: Learning From 3000 Incidents And How Engineering Is Changing | Marc Brooker
And then once you deeply understand what happened, then the ability, then a great post-mortem steps through the whys behind that at multiple levels, right?