Martin Kleppmann
๐ค SpeakerAppearances Over Time
Podcast Appearances
Its successors, like in the form of Spark and Flink, for example, they are used.
And so we still reference MapReduce in the second edition, but more as a learning tool in order to understand how these kind of partition-sharded batch processing systems work.
So that's one thing where we've been able to reduce the coverage.
But other areas where we've increased the coverage are, for example, systems in support of AI.
And so even though this is not an AI book, but there are still data systems concerns that arise when needing to support AI applications, like a classic one is vector indexes, for example.
And so we've added some coverage of vector indexes to the storage engine chapter.
Fit in really well there because it already covers various different indexing strategies anyway.
And so vector indexes, it's just another indexing strategy.
We also added some coverage of data frames, for example.
That's not an exclusively AI thing, but data frames are quite a good data representation for training data, for example.
And that was not one of the data models that we discussed in the first edition, but we decided to add to the second edition because it has actually become a very important data model that people are using alongside all of the classic data models like relational and graph and JSON documents and so on.
And so there are these places where we've just expanded the coverage a bit to reflect the kinds of systems people are building, for example, to support AI without it changing the direction of the book entirely.
Absolutely.
Yeah.
So the motivation for putting in an ethics section there in the first edition was that I just felt it had been quite ignored as a concern during my time in industry.
That's like,
especially in startups, people were very focused on like building a product that their customers would love and really like deprioritizing these sort of ethical questions in the process.
And so, for example, with the consumer facing products, it might be that the products are very much geared towards essentially data harvesting, collecting behavioral data, because that's what can be monetized in the form of advertising and
there seemed to be just very little reflection on what was good and bad about these sort of things.
So I really just wanted to encourage a bit of thinking there.