The Pragmatic Engineer

Designing Data-intensive Applications with Martin Kleppmann

22 Apr 2026

1h 25m

16040 words

2 speakers

22 Apr 2026

Audio

Description

Brought to You By:• Statsig — ⁠ The unified platform for flags, analytics, experiments, and more.• Sonar – The makers of SonarQube, the industry standard for automated code review• WorkOS – Everything you need to make your app enterprise ready.—Martin Kleppmann is a researcher and the author of Designing Data-Intensive Applications, one of the most influential books on modern distributed systems. As of this month, the second, heavily updated edition of the book is out.In this episode of Pragmatic Engineer, we discuss Martin’s career in tech building startups, how he ended up writing this iconic book, and what he’s focused on now after moving into academia.We talk about the tradeoffs behind modern infrastructure, how the cloud has changed what it means to scale, and the thinking behind Designing Data-Intensive Applications, including what’s changing in the second edition.Martin reflects on lessons from building startups like Rapportive, which he sold to LinkedIn, and shares how his experience in both academia and industry shaped his perspective.We also explore what’s ahead: why formal verification may become more important in an AI-assisted world, the challenges of building local-first software, and his recent research into using cryptography to improve transparency in supply chains without exposing sensitive data.—Timestamps(00:00) Early career(05:46) Building Rapportive(10:47) Working at LinkedIn(14:09) Writing Designing Data-Intensive Applications(23:00) Reliability, scalability, and repeatability (26:24) DDIA: the second edition(30:50) Tradeoffs of using cloud services (39:02) How the cloud changed scaling (42:53) The trouble with distributed systems(49:02) Ethics for software engineers (52:45) Formal verification(1:00:12) Academia vs. industry (1:03:50) Local-first software (1:09:50) Computer science education(1:18:32) Martin’s current research and advice—The Pragmatic Engineer deepdives relevant for this episode:• Building Bluesky: a distributed social network• Inside Uber’s move to the cloud• The history of servers, the cloud, and what’s next• The past and future of modern backend practices• How Kubernetes is built—Production and marketing by ⁠⁠⁠⁠⁠⁠⁠⁠https://penname.co/⁠⁠⁠⁠⁠⁠⁠⁠. For inquiries about sponsoring the podcast, email [email protected]. Get full access to The Pragmatic Engineer at newsletter.pragmaticengineer.com/subscribe

Chapters

1. What inspired Martin Kleppmann to write Designing Data-Intensive Applications? 2. How did Martin's experience at LinkedIn shape his views on distributed systems? 3. What are the major updates in the second edition of Designing Data-Intensive Applications? 4. What trade-offs should engineers consider when using cloud services? 5. How has the definition of scalability evolved with cloud technologies? 6. What challenges arise from using decentralized systems? 7. How does formal verification play a role in modern software engineering? 8. What are the implications of AI on software engineering and education?

Featured

Martin Kleppmann

Gergely Orosz

Topics

LinkedIn Y Combinator Designing Data-Intensive Applications Apache Kafka

Transcription

Chapter 1: What inspired Martin Kleppmann to write Designing Data-Intensive Applications?

0.031 - 17.556 Gergely Orosz

Designing data-intensive applications has been the go-to book for anyone building large backend systems. Nine years after publishing this book, the second edition is here. Martin Klutman is the author of this generational book. I sat down with him and today we cover how working on Kafka at LinkedIn directly shaped the ideas that became the first edition of the book.

17.576 - 31.875 Gergely Orosz

What's new in the second edition and why things like MapReduce got removed from this updated version. formal methods, local first software, decentralized access, and many more. If you care about how large systems work, where they're heading, and what the fundamentals are that don't change, this episode is for you.

32.075 - 45.913 Gergely Orosz

This episode is presented by StatSig, the unified platform for flags, analytics experiments, and more. This episode is brought to you by Sonar. Sonar, the makers of Sonar Cube, understands that code quality is about more than just avoiding syntax errors.

Chapter 2: How did Martin's experience at LinkedIn shape his views on distributed systems?

46.534 - 69.411 Gergely Orosz

It's about long-term maintainability by protecting the structural integrity of the system. As agents generate code at massive scale, they often ignore your system's structural integrity. This creates tangles, duplicated code, and other maintainability issues. These issues turn a module design into a big ball of mud, making it increasingly difficult to extend.

69.652 - 85.956 Gergely Orosz

But here's something that's really helpful, SonarQube's architecture management. It moves architectural governance out of static wikis and into your automated workflow. It allows you to visualize your current architecture, define architectural boundaries, and manage architectural issues in real time.

86.297 - 105.688 Gergely Orosz

Whether it's a human or an AI agent at the keyboard, Sonar acts as a circuit breaker for structural decay. It ensures every commit respects the system's blueprint, protecting the long-term health of your most complex applications. Head to sonarsource.com slash pragmatic to find out more. So Martin, welcome to the podcast. Hi, Gurkha. It's great to be here.

Chapter 3: What are the major updates in the second edition of Designing Data-Intensive Applications?

106.409 - 123.677 Gergely Orosz

It's amazing to have you here. I don't think you need introduction to many software interns, including myself. You're the author of this iconic book that I've had on my bookshelf for probably about 10 years, not much longer after it came out. Before we get into this book, which we're going to talk about, how did you get into the technology field?

123.742 - 134.064 Martin Kleppmann

Yes, well, I did undergraduate computer science like many others. And then after that, I wasn't quite sure what to do with my life. But I thought, well, it's like starting a startup seems like an interesting thing to try.

Chapter 4: What trade-offs should engineers consider when using cloud services?

134.104 - 153.935 Martin Kleppmann

So I started a startup having no clue what I was going to actually do and then spent the first while searching around for things that might be interesting. The first startup didn't work out that well, but through that I met some others who then became my co-founders for the second startup, which worked better. And we sold that one to LinkedIn.

154.195 - 173.939 Martin Kleppmann

And then after that, I started being interested in like teaching these distributed systems concepts. So that's when I got into writing the book. And then during the writing of the book, I also switched over from industry back to academia. Can we talk a little bit about your first and second startup? Yeah, GoTested, this was like 2008 or something like that.

173.959 - 180.291 Martin Kleppmann

It was the age where people were having really difficulties getting their JavaScript working cross-browser.

Chapter 5: How has the definition of scalability evolved with cloud technologies?

180.391 - 186.943 Martin Kleppmann

Internet Explorer was still pretty big at the time. Chrome had just come out. All the browsers were incompatible with each other.

Chapter 6: What challenges arise from using decentralized systems?

186.983 - 192.814 Martin Kleppmann

And so GoTested was a cross-browser automated testing service for websites.

192.794 - 220.152 Martin Kleppmann

was based on selenium an open source project that still exists and the ideas you would write like test scripts that automate the user clicking through the various interactions with a website and then just check that the right behavior happens and so yeah it was based on selenium but just as it provided as a hosted service so people wouldn't have to run various vms with various operating systems themselves it worked technically but i found it really hard to actually get adoption for it a lot of

220.132 - 242.933 Martin Kleppmann

people building websites like in theory said oh yeah this is great we we need to test cross browser and in practice actually it was really difficult to get them to integrate it into their workflow and just get in the habit of using it and investing in writing the test scripts so so that ended up not really going anywhere so it's like there wasn't like a business to be done or or like revenue to be generated in a meaningful sense

243.352 - 260.517 Martin Kleppmann

Yeah, well, there's at least one other, maybe two other companies from that same era that did manage to make a business. Source Labs is one that managed to actually succeed. But even for them, it was a pretty slow running business. I think it was not an easy business to be in.

Chapter 7: How does formal verification play a role in modern software engineering?

260.497 - 271.166 Gergely Orosz

And for the startup, were you in the UK building it? I was in the UK at the time, yes. And was it bootstrapped? Did you raise some kind of funding? How big was the team? How can we imagine this?

271.567 - 290.503 Martin Kleppmann

It was mostly bootstrapped. So I did a bunch of consulting in order to fund hiring some people and then hired some friends on the cheap to help contribute to actually building the product. And so it was done all very cheaply. I had a very small amount of angel money in there, but mostly bootstrapped.

290.483 - 296.37 Gergely Orosz

And then when you decided to not go forward with this, how did the next startup come?

Chapter 8: What are the implications of AI on software engineering and education?

296.931 - 297.672 Gergely Orosz

Reportive, right?

297.892 - 316.896 Martin Kleppmann

Yeah, the second one was reportive. That went a lot better. So that was putting social media inside Gmail, basically. So the idea was that if you get an email from someone you don't know, we had a little browser extension which manipulated the Gmail web interface so that on the side next to the email, we'd show you a summary social profile with like a

316.876 - 333.147 Martin Kleppmann

profile picture and like a job title pulled from LinkedIn and recent tweets pulled from Twitter and maybe recent Facebook posts or things like that. Just whatever we could find about that person and put that as a social summary next to the email. We

333.127 - 354.63 Martin Kleppmann

started in 2010 or something like that it was then pretty quickly became quite popular um and so on the back of that we were then able to raise some money from y combinator which was still fairly young at the time that was very young that you must have been one of the very early batches Yeah, I can't remember exactly when they started, but it was certainly in the early years.

354.83 - 360.319 Martin Kleppmann

I think Y Combinator had already built up quite a good reputation at the time, but it was still fairly small.

360.64 - 369.755 Gergely Orosz

And then as part of Y Combinator, did you have to fly from the UK to San Francisco to attend that 10-week program, if I remember? Exactly, yes.

369.815 - 384.476 Martin Kleppmann

So we initially came for the three months or whatever it was of the Y Combinator, but then we were able to get US work visas for ourselves and set up permanently in San Francisco.

384.696 - 391.485 Gergely Orosz

How was that shift from the UK, where you spent going to university, your first startup, the first part of this, to coming to San Francisco?

391.525 - 414.737 Martin Kleppmann

It was very exciting because it felt like going to the center of where it was all happening, really, and We, at the start of that, not knowing anybody at all, we knew like one or two people in the entire Bay Area, but we like contacted them and they introduced us to more people and they introduced us to more people. And so we were able to pretty quickly actually build up a network.

The Pragmatic Engineer

Designing Data-intensive Applications with Martin Kleppmann

Chapter 1: What inspired Martin Kleppmann to write Designing Data-Intensive Applications?

Chapter 2: How did Martin's experience at LinkedIn shape his views on distributed systems?

Chapter 3: What are the major updates in the second edition of Designing Data-Intensive Applications?

Chapter 4: What trade-offs should engineers consider when using cloud services?

Chapter 5: How has the definition of scalability evolved with cloud technologies?

Chapter 6: What challenges arise from using decentralized systems?

Chapter 7: How does formal verification play a role in modern software engineering?

Chapter 8: What are the implications of AI on software engineering and education?

Sign in to Audioscrape

Share this moment

The Pragmatic Engineer

Designing Data-intensive Applications with Martin Kleppmann

Chapter 1: What inspired Martin Kleppmann to write Designing Data-Intensive Applications?

Chapter 2: How did Martin's experience at LinkedIn shape his views on distributed systems?

Chapter 3: What are the major updates in the second edition of Designing Data-Intensive Applications?

Chapter 4: What trade-offs should engineers consider when using cloud services?

Chapter 5: How has the definition of scalability evolved with cloud technologies?

Chapter 6: What challenges arise from using decentralized systems?

Chapter 7: How does formal verification play a role in modern software engineering?

Chapter 8: What are the implications of AI on software engineering and education?

Want to see the complete chapter?

Sign in to Audioscrape

Share this moment