The Peterman Pod
AWS Distinguished Eng: Learning From 3000 Incidents And How Engineering Is Changing | Marc Brooker
And so, you know, obviously very attractive.
The Peterman Pod
AWS Distinguished Eng: Learning From 3000 Incidents And How Engineering Is Changing | Marc Brooker
But...
The Peterman Pod
AWS Distinguished Eng: Learning From 3000 Incidents And How Engineering Is Changing | Marc Brooker
The downside of caches, especially in distributed systems, is they have this mode.
The Peterman Pod
AWS Distinguished Eng: Learning From 3000 Incidents And How Engineering Is Changing | Marc Brooker
There's a mode where the cache is full and the cache is full of the right data in time and space to perform very well.
The Peterman Pod
AWS Distinguished Eng: Learning From 3000 Incidents And How Engineering Is Changing | Marc Brooker
And there's a mode where the cache is empty or contains the wrong data.
The Peterman Pod
AWS Distinguished Eng: Learning From 3000 Incidents And How Engineering Is Changing | Marc Brooker
And in the first mode, the system is fast and happy and healthy.
The Peterman Pod
AWS Distinguished Eng: Learning From 3000 Incidents And How Engineering Is Changing | Marc Brooker
In the second mode, the system is slow, often down because now the backend isn't scaled to deal with all of this uncashed traffic.
The Peterman Pod
AWS Distinguished Eng: Learning From 3000 Incidents And How Engineering Is Changing | Marc Brooker
Customers are very disappointed.
The Peterman Pod
AWS Distinguished Eng: Learning From 3000 Incidents And How Engineering Is Changing | Marc Brooker
And often it is down in a stable way.
The Peterman Pod
AWS Distinguished Eng: Learning From 3000 Incidents And How Engineering Is Changing | Marc Brooker
And this is this kind of idea of metastable failures where the system has switched from state one to state two.
The Peterman Pod
AWS Distinguished Eng: Learning From 3000 Incidents And How Engineering Is Changing | Marc Brooker
And in state two, it's still stable, right?
The Peterman Pod
AWS Distinguished Eng: Learning From 3000 Incidents And How Engineering Is Changing | Marc Brooker
Like it's still, it's down, but it's not gonna come back up under its own energy because for example,
The Peterman Pod
AWS Distinguished Eng: Learning From 3000 Incidents And How Engineering Is Changing | Marc Brooker
all of this traffic is causing a huge amount of contention in my database or is saturating the network.
The Peterman Pod
AWS Distinguished Eng: Learning From 3000 Incidents And How Engineering Is Changing | Marc Brooker
And so I can't even refill the cache.
The Peterman Pod
AWS Distinguished Eng: Learning From 3000 Incidents And How Engineering Is Changing | Marc Brooker
It's not even getting the right kind of data in.
The Peterman Pod
AWS Distinguished Eng: Learning From 3000 Incidents And How Engineering Is Changing | Marc Brooker
And so, you know, when I talk about the downsides of caches, it's really about, you know, how do we avoid that modality between, you know, fast and, you know, that value of caches and the, you know, how do we avoid the state where we're down?
The Peterman Pod
AWS Distinguished Eng: Learning From 3000 Incidents And How Engineering Is Changing | Marc Brooker
And so if I go back to D-SQL, our answer there is D-SQL, what we call the storage tier, is essentially a cache, but it is a complete cache.
The Peterman Pod
AWS Distinguished Eng: Learning From 3000 Incidents And How Engineering Is Changing | Marc Brooker
It contains every row in the database.
The Peterman Pod
AWS Distinguished Eng: Learning From 3000 Incidents And How Engineering Is Changing | Marc Brooker
And so it doesn't have this mode where how do I recover from it being empty or containing the wrong data?
The Peterman Pod
AWS Distinguished Eng: Learning From 3000 Incidents And How Engineering Is Changing | Marc Brooker
It contains all of the data.