AWS Distinguished Eng: Learning From 3000 Incidents And How Engineering Is Changing | Marc Brooker
Similarly, if you look at a more, let's say, classical relational database design like Aurora, the Aurora leader is constantly telling the potential failover targets, here's something you should cache, here's something you should cache, here's something you should cache.
AWS Distinguished Eng: Learning From 3000 Incidents And How Engineering Is Changing | Marc Brooker
Or use a scalable backend, DSQL or DynamoDB or whatever your favorite scalable database is, and keep your database vendor honest about getting to the scale and performance you need rather than putting a cache in front of things.
AWS Distinguished Eng: Learning From 3000 Incidents And How Engineering Is Changing | Marc Brooker
the biggest, most impactful system postmortems across the industry, I would say that these kinds of metastable failures have been an underlying cause in probably a majority of them.
AWS Distinguished Eng: Learning From 3000 Incidents And How Engineering Is Changing | Marc Brooker
And it's super important that as an industry and as a community of practice, we understand those things deeply because also those cases where these do happen,