Marc Brooker
๐ค SpeakerAppearances Over Time
Podcast Appearances
broadly that class of problems.
So folks can say, hey, I'm going to build on D-SQL and just not have this whole class of problems.
And I think that's a really kind of powerful outer loop of the post-mortem process is to say, how do we turn all of these lessons into new services and into service improvements?
How do you prevent misbehaving clients from being a problem for the database?
Yeah, so in DSQL's case, we have no pessimistic locking.
And so within the scope of a transaction, everything that happens in that transaction, all of the reads happen using this mechanism called multiversion concurrency control, where every row in the database, we sort of store a history of versions.
And so you can read an old version of a row without blocking writers and saying, hey, you can't update this because I just read it.
And then locally within the query processor that's handling a connection, we spool the writes locally and then you get to commit time and we do this optimistic check of, can I commit this transaction at the transaction commit time?
And so combining those two mechanisms of having multiversion concurrency control and the scale-out storage that comes with it and the commit time optimistic checks, we can strongly say that there is no way that a reader of a piece of data can block other writers, and there's no way that a writer of data can block readers.
Writers can block writers, but only...
Only by changing data, not just by looking at it.
And so you can say, well, I can cause, sorry, writers can't block writers, but they can prevent other writers' transactions from eventually committing by making a bunch of changes.
And that is inherent to the definition of the particular database isolation level.
Yeah, it's actually surprisingly small.
And it's surprisingly small because if you look at the access patterns for most online databases, even ones that do a lot of write traffic, that write traffic tends to be quite concentrated.
And it's quite unusual for an online database workload or even an analytics workload
to make a second version of every row in the database.
Typically what it's doing is making a, you know, first, second, third, the hundredth version of this row and a 50th version of that row, but the vast majority of data isn't changing.
And so it's super workload dependent, as is everything in the database world, but the overhead tends to be relatively small.
I would say it's unusual for...