Menu
Sign In Search Podcasts Libraries Charts People & Topics Add Podcast API Blog Pricing

Marc Brooker

๐Ÿ‘ค Speaker
499 total appearances

Appearances Over Time

Podcast Appearances

The Peterman Pod
AWS Distinguished Eng: Learning From 3000 Incidents And How Engineering Is Changing | Marc Brooker

Similarly, if you look at a more, let's say, classical relational database design like Aurora, the Aurora leader is constantly telling the potential failover targets, here's something you should cache, here's something you should cache, here's something you should cache.

The Peterman Pod
AWS Distinguished Eng: Learning From 3000 Incidents And How Engineering Is Changing | Marc Brooker

So when a failover happens, the cache is warm on the failover target.

The Peterman Pod
AWS Distinguished Eng: Learning From 3000 Incidents And How Engineering Is Changing | Marc Brooker

And so those are the kinds of things that you can do to avoid those modalities.

The Peterman Pod
AWS Distinguished Eng: Learning From 3000 Incidents And How Engineering Is Changing | Marc Brooker

But in general,

The Peterman Pod
AWS Distinguished Eng: Learning From 3000 Incidents And How Engineering Is Changing | Marc Brooker

I wouldn't extract this as a rule or say that this applies 100% of the time.

The Peterman Pod
AWS Distinguished Eng: Learning From 3000 Incidents And How Engineering Is Changing | Marc Brooker

But in general, I prefer to see the teams around me avoiding caching where possible.

The Peterman Pod
AWS Distinguished Eng: Learning From 3000 Incidents And How Engineering Is Changing | Marc Brooker

I prefer patterns where you have a, let's say, complete materialized view of the data if you need very fast access to it.

The Peterman Pod
AWS Distinguished Eng: Learning From 3000 Incidents And How Engineering Is Changing | Marc Brooker

Especially if it's slow moving, just pull it down onto your local machine and work with it in memory.

The Peterman Pod
AWS Distinguished Eng: Learning From 3000 Incidents And How Engineering Is Changing | Marc Brooker

If it's only being updated once a week, who cares?

The Peterman Pod
AWS Distinguished Eng: Learning From 3000 Incidents And How Engineering Is Changing | Marc Brooker

Just make lots of copies of it.

The Peterman Pod
AWS Distinguished Eng: Learning From 3000 Incidents And How Engineering Is Changing | Marc Brooker

So that's one pattern.

The Peterman Pod
AWS Distinguished Eng: Learning From 3000 Incidents And How Engineering Is Changing | Marc Brooker

Or use a scalable backend, DSQL or DynamoDB or whatever your favorite scalable database is, and keep your database vendor honest about getting to the scale and performance you need rather than putting a cache in front of things.

The Peterman Pod
AWS Distinguished Eng: Learning From 3000 Incidents And How Engineering Is Changing | Marc Brooker

So caching isn't a bad pattern, but it is a pattern with some...

The Peterman Pod
AWS Distinguished Eng: Learning From 3000 Incidents And How Engineering Is Changing | Marc Brooker

significant downsides that are really best avoided.

The Peterman Pod
AWS Distinguished Eng: Learning From 3000 Incidents And How Engineering Is Changing | Marc Brooker

Yeah, it's not super common, right?

The Peterman Pod
AWS Distinguished Eng: Learning From 3000 Incidents And How Engineering Is Changing | Marc Brooker

Like you might go years without seeing something like that.

The Peterman Pod
AWS Distinguished Eng: Learning From 3000 Incidents And How Engineering Is Changing | Marc Brooker

But if you look across

The Peterman Pod
AWS Distinguished Eng: Learning From 3000 Incidents And How Engineering Is Changing | Marc Brooker

the biggest, most impactful system postmortems across the industry, I would say that these kinds of metastable failures have been an underlying cause in probably a majority of them.

The Peterman Pod
AWS Distinguished Eng: Learning From 3000 Incidents And How Engineering Is Changing | Marc Brooker

And it's super important that as an industry and as a community of practice, we understand those things deeply because also those cases where these do happen,

The Peterman Pod
AWS Distinguished Eng: Learning From 3000 Incidents And How Engineering Is Changing | Marc Brooker

you know, tend to be larger scale issues, longer recovery time issues, and more complex to fix issues, right?