Ryan Worrell
๐ค PersonAppearances Over Time
Podcast Appearances
Some people have strong opinions about the actual developer programming model of Kafka and that it's a little hard to use sometimes. I think that's less of a big deal these days as more tools have integrated with Kafka. It makes it even easier to use Kafka than there are some other systems that might have a theoretically easier to use programming model. But everything speaks Kafka now.
Some people have strong opinions about the actual developer programming model of Kafka and that it's a little hard to use sometimes. I think that's less of a big deal these days as more tools have integrated with Kafka. It makes it even easier to use Kafka than there are some other systems that might have a theoretically easier to use programming model. But everything speaks Kafka now.
So those concerns are mostly trumped by the fact that it's the de facto standard. I think really what most people are concerned about when, like if you don't use Kafka today and you're thinking about bringing it in to your company, the two things that you're going to be concerned about are how hard is it to run and how much is it going to cost? Those are typically concerns.
So those concerns are mostly trumped by the fact that it's the de facto standard. I think really what most people are concerned about when, like if you don't use Kafka today and you're thinking about bringing it in to your company, the two things that you're going to be concerned about are how hard is it to run and how much is it going to cost? Those are typically concerns.
people's two big blockers. It doesn't have anything to do with the fact that conceptually they have an issue with Kafka. It's those more practical things.
people's two big blockers. It doesn't have anything to do with the fact that conceptually they have an issue with Kafka. It's those more practical things.
It's a number of different things. I think the first one is yes, being responsible for anything that stores data on local disks, if you want to achieve high availability and high durability of your data, is challenging. It requires experienced SREs to, like... handle those types of failures when they do occur.
It's a number of different things. I think the first one is yes, being responsible for anything that stores data on local disks, if you want to achieve high availability and high durability of your data, is challenging. It requires experienced SREs to, like... handle those types of failures when they do occur.
But that, I think, can be dealt with because people do that with other systems all the time. But I think that most people's problems with Kafka come when they want to scale up and scale down the cluster in response to load. The open source project doesn't really give you much tooling when it comes to helping you manage that process.
But that, I think, can be dealt with because people do that with other systems all the time. But I think that most people's problems with Kafka come when they want to scale up and scale down the cluster in response to load. The open source project doesn't really give you much tooling when it comes to helping you manage that process.
Like, for example, in the open source project, there's no automated tool to rebalance the data among the machines when you add or remove machines. That's kind of a table stakes feature in a lot of... If you're thinking about a distributed relational database, that would seem kind of silly if you had to run a script to move data between the nodes and the database.
Like, for example, in the open source project, there's no automated tool to rebalance the data among the machines when you add or remove machines. That's kind of a table stakes feature in a lot of... If you're thinking about a distributed relational database, that would seem kind of silly if you had to run a script to move data between the nodes and the database.
But that is true of open source Kafka. And there are now. There are other tools that you can use alongside of it that can take some of this work off of you. But they're not always the easiest to use either. It's not like a self-balancing, self-managing thing like a lot of the distributed relational databases are. It's something that takes a little bit more hands-on work.
But that is true of open source Kafka. And there are now. There are other tools that you can use alongside of it that can take some of this work off of you. But they're not always the easiest to use either. It's not like a self-balancing, self-managing thing like a lot of the distributed relational databases are. It's something that takes a little bit more hands-on work.
And another thing that goes along with that is if you're storing data for a long period of time, in the open source project. They didn't add a tiered storage feature until very recently in the open source project.
And another thing that goes along with that is if you're storing data for a long period of time, in the open source project. They didn't add a tiered storage feature until very recently in the open source project.
And the time that it takes just to copy the data around from machine to machine when you're scaling up or scaling down the cluster can be hours or days, depending on how dense you're running the machines. Some of that is alleviated with the new tiered storage stuff where the older data is moved to object storage, but that part doesn't alleviate the inner AZ networking costs.
And the time that it takes just to copy the data around from machine to machine when you're scaling up or scaling down the cluster can be hours or days, depending on how dense you're running the machines. Some of that is alleviated with the new tiered storage stuff where the older data is moved to object storage, but that part doesn't alleviate the inner AZ networking costs.
And there's another post on our blog about tiered storage and Kafka if people are interested in learning more about that topic.
And there's another post on our blog about tiered storage and Kafka if people are interested in learning more about that topic.