AWS Distinguished Eng: Learning From 3000 Incidents And How Engineering Is Changing | Marc Brooker
Like we don't need to fix these root causes because our on-calls are superheroic and they're going to stay up all night and they're going to, you know, they're going to hack around things and they don't mind being paged a hundred times a week.
AWS Distinguished Eng: Learning From 3000 Incidents And How Engineering Is Changing | Marc Brooker
We're just doing this fantastically expensive investment of taking all of these people and their strong ownership and their expertise and spending them just on this break-fix cycle.
AWS Distinguished Eng: Learning From 3000 Incidents And How Engineering Is Changing | Marc Brooker
And that's where you need to kind of look at it from the outside and say, well, let's take this energy of this team, fantastic energy, and focus it on improving the service, getting out of the cycle, finding new things to fix, finding new things to build.
AWS Distinguished Eng: Learning From 3000 Incidents And How Engineering Is Changing | Marc Brooker
to realize that, oh no, we're actually caring about it at the wrong level and we're not serving our business in the best possible way by being so narrowly and tactically focused on this break-fix cycle.
AWS Distinguished Eng: Learning From 3000 Incidents And How Engineering Is Changing | Marc Brooker
Like it's, hey, I'm going to take these core ideas from computer science of temporal and spatial locality, and I'm going to exploit those to make my system faster, scale better, et cetera.