Ryan Worrell
๐ค PersonAppearances Over Time
Podcast Appearances
Thanks, it's great to be here.
Thanks, it's great to be here.
Yeah, Kafka is both a very interesting and a very boring system. The easiest way to think about it is it lets you create topics and you can have producers that write messages into these topics and consumers that consume messages out of the topics. It's kind of like a publish and subscribe type deal.
Yeah, Kafka is both a very interesting and a very boring system. The easiest way to think about it is it lets you create topics and you can have producers that write messages into these topics and consumers that consume messages out of the topics. It's kind of like a publish and subscribe type deal.
But the thing that makes it interesting is the fact that once you consume those messages, they're not deleted. So they're still stored inside the system and another consumer can go and read them again for a different purpose. Like if you have two different applications that are consuming the same data set, they can both equally consume those messages.
But the thing that makes it interesting is the fact that once you consume those messages, they're not deleted. So they're still stored inside the system and another consumer can go and read them again for a different purpose. Like if you have two different applications that are consuming the same data set, they can both equally consume those messages.
Let's say that you have one application that does machine learning training and another that does alerting based on the two different, like the same messages you want to process them, but you want to process them in different applications. Kafka is a useful tool for that.
Let's say that you have one application that does machine learning training and another that does alerting based on the two different, like the same messages you want to process them, but you want to process them in different applications. Kafka is a useful tool for that.
It also provides ordering for those messages so that if you need to implement an application where you send messages in a certain order and you want that order to be retained on the other side, Kafka also does that for you. Each message is assigned a unique offset within a partition of that topic, which is kind of like a shard.
It also provides ordering for those messages so that if you need to implement an application where you send messages in a certain order and you want that order to be retained on the other side, Kafka also does that for you. Each message is assigned a unique offset within a partition of that topic, which is kind of like a shard.
And within that shard, if you process the messages in the same order again, or if you process the messages in that partition again, you'll get them back in the same order every time. So you can implement something like state machine replication or that type of thing where the ordering matters.
And within that shard, if you process the messages in the same order again, or if you process the messages in that partition again, you'll get them back in the same order every time. So you can implement something like state machine replication or that type of thing where the ordering matters.
Yeah, the reason why it's useful is there just isn't a lot out there that fulfills those, you know, the two main things. It's like a publish and subscribe mechanism that's scalable, right? And then also, that lets you have different consumers process the same set of messages without one of the consumers deleting it.
Yeah, the reason why it's useful is there just isn't a lot out there that fulfills those, you know, the two main things. It's like a publish and subscribe mechanism that's scalable, right? And then also, that lets you have different consumers process the same set of messages without one of the consumers deleting it.
There's a lot of queuing systems that the messages, when you consume them once, they're just gone forever at that point. The purpose is to consume the message and then have it go away, not to reprocess it again in the future. There are a lot of use cases for it. I'd say that the most broadly popular one is for moving data from point A to point B, kind of like a dump pipe.
There's a lot of queuing systems that the messages, when you consume them once, they're just gone forever at that point. The purpose is to consume the message and then have it go away, not to reprocess it again in the future. There are a lot of use cases for it. I'd say that the most broadly popular one is for moving data from point A to point B, kind of like a dump pipe.
It's used a lot in observability and security-related workloads, where you have a lot of application servers that are generating logs, and you want to temporarily put those logs somewhere before you put them in something else, like you say you want to put them in Elasticsearch or something like that. Elasticsearch can be a little finicky.
It's used a lot in observability and security-related workloads, where you have a lot of application servers that are generating logs, and you want to temporarily put those logs somewhere before you put them in something else, like you say you want to put them in Elasticsearch or something like that. Elasticsearch can be a little finicky.
So you want to have Kafka, which is a much simpler system in place, as a temporary buffer to hold those long messages that you want to write to Elasticsearch in case that Elasticsearch cluster is down or you're doing an upgrade or something like that. There's a lot of different reasons for it, but Kafka is pretty much the de facto standard for those kind of workloads.
So you want to have Kafka, which is a much simpler system in place, as a temporary buffer to hold those long messages that you want to write to Elasticsearch in case that Elasticsearch cluster is down or you're doing an upgrade or something like that. There's a lot of different reasons for it, but Kafka is pretty much the de facto standard for those kind of workloads.