Ryan Worrell

The Changelog: Software Development, Open Source

Reinventing Kafka on object storage (Interview)

The brokers own some set of partitions from a leadership perspective. And then there's also replicas of that that are just copying the data. And it's just other brokers that are the replicas for those partitions. So the broker will write that data that it receives from a producer client down to the local disk and replicate it out to the followers. And then

2247.467 View full episode →

The Changelog: Software Development, Open Source

Reinventing Kafka on object storage (Interview)

The brokers own some set of partitions from a leadership perspective. And then there's also replicas of that that are just copying the data. And it's just other brokers that are the replicas for those partitions. So the broker will write that data that it receives from a producer client down to the local disk and replicate it out to the followers. And then

2247.467 View full episode →

The Changelog: Software Development, Open Source

Reinventing Kafka on object storage (Interview)

a consumer can come along and read either from a replica or the leader the data that producer wrote. But they're all coordinating on essentially one of those brokers owns the partition specifically that I'm interested in and reading. So that's how it works in the open source product and in Warp stream, we've decoupled the idea of ownership of a partition from the broker itself.

2272.159 View full episode →

The Changelog: Software Development, Open Source

Reinventing Kafka on object storage (Interview)

a consumer can come along and read either from a replica or the leader the data that producer wrote. But they're all coordinating on essentially one of those brokers owns the partition specifically that I'm interested in and reading. So that's how it works in the open source product and in Warp stream, we've decoupled the idea of ownership of a partition from the broker itself.

2272.159 View full episode →

The Changelog: Software Development, Open Source

Reinventing Kafka on object storage (Interview)

We have a metadata store that runs inside our control plane that has a mapping of, here are all the files and object storage. And within those files, the data for this partition for this offset is here. It's in some section of a file in object storage.

2301.499 View full episode →

The Changelog: Software Development, Open Source

Reinventing Kafka on object storage (Interview)

We have a metadata store that runs inside our control plane that has a mapping of, here are all the files and object storage. And within those files, the data for this partition for this offset is here. It's in some section of a file in object storage.

2301.499 View full episode →

The Changelog: Software Development, Open Source

Reinventing Kafka on object storage (Interview)

So any of our agents, which are like the stateless broker that speaks the Kafka protocol to your clients, any one of those agents can consult the metadata store and ask, I want to read this topic partition at offset X. Where do I have to go in object storage and potentially multiple places in object storage? Where do I have to go in object storage to read that data?

2320.427 View full episode →

The Changelog: Software Development, Open Source

Reinventing Kafka on object storage (Interview)

So any of our agents, which are like the stateless broker that speaks the Kafka protocol to your clients, any one of those agents can consult the metadata store and ask, I want to read this topic partition at offset X. Where do I have to go in object storage and potentially multiple places in object storage? Where do I have to go in object storage to read that data?

2320.427 View full episode →

The Changelog: Software Development, Open Source

Reinventing Kafka on object storage (Interview)

But because the metadata store inside the control plane is handling the ordering aspect of it, essentially, you get the same guarantees as Kafka in terms of I have this message with this key that's routed to this topic partition, and I want them to stay in the same order because I'm writing them in a specific order. That ordering part is enforced by the metadata store inside the control plane.

2342.685 View full episode →

The Changelog: Software Development, Open Source

Reinventing Kafka on object storage (Interview)

But because the metadata store inside the control plane is handling the ordering aspect of it, essentially, you get the same guarantees as Kafka in terms of I have this message with this key that's routed to this topic partition, and I want them to stay in the same order because I'm writing them in a specific order. That ordering part is enforced by the metadata store inside the control plane.

2342.685 View full episode →

The Changelog: Software Development, Open Source

Reinventing Kafka on object storage (Interview)

But the data plane part of actually moving all of those messages around is only inside the agents and object storage. So it lets you do that thing that I was saying before, where if you want to scale up and down, it's very easy to do that because you don't have to rebalance those partitions, which take up space on the local disk amongst the brokers in order to facilitate that.

2364.234 View full episode →

The Changelog: Software Development, Open Source

Reinventing Kafka on object storage (Interview)

But the data plane part of actually moving all of those messages around is only inside the agents and object storage. So it lets you do that thing that I was saying before, where if you want to scale up and down, it's very easy to do that because you don't have to rebalance those partitions, which take up space on the local disk amongst the brokers in order to facilitate that.

2364.234 View full episode →

The Changelog: Software Development, Open Source

Reinventing Kafka on object storage (Interview)

In terms of being faster, it's faster at the fact that there is no rebalancing that happens. Because the data is always just in object storage somewhere. You don't have to do any rebalancing for it. That part of it is faster. There's obviously a trade-off when you do this in that the latency of writing to object storage is higher than writing to the local disk.

2391.066 View full episode →

The Changelog: Software Development, Open Source

Reinventing Kafka on object storage (Interview)

In terms of being faster, it's faster at the fact that there is no rebalancing that happens. Because the data is always just in object storage somewhere. You don't have to do any rebalancing for it. That part of it is faster. There's obviously a trade-off when you do this in that the latency of writing to object storage is higher than writing to the local disk.

2391.066 View full episode →

The Changelog: Software Development, Open Source

Reinventing Kafka on object storage (Interview)

So if you want your data to be durable, you have to wait for the data to be written to object storage first. So that's the primary trade-off somebody that's using Warpstream would be making is that they're comfortable with around 500 milliseconds at the P99 of latency to write data to the system.

2409.735 View full episode →

The Changelog: Software Development, Open Source

Reinventing Kafka on object storage (Interview)

So if you want your data to be durable, you have to wait for the data to be written to object storage first. So that's the primary trade-off somebody that's using Warpstream would be making is that they're comfortable with around 500 milliseconds at the P99 of latency to write data to the system.

2409.735 View full episode →

The Changelog: Software Development, Open Source

Reinventing Kafka on object storage (Interview)

And then the end-to-end latency of like a producer sends data and then it's consumed by a consumer is somewhere between one to one and a half seconds again at the P99.

2428.992 View full episode →

The Changelog: Software Development, Open Source

Reinventing Kafka on object storage (Interview)

And then the end-to-end latency of like a producer sends data and then it's consumed by a consumer is somewhere between one to one and a half seconds again at the P99.

2428.992 View full episode →

The Changelog: Software Development, Open Source

Reinventing Kafka on object storage (Interview)

So it's interesting that you use that word real-time because we've talked to a ton of different Kafka users. And when you ask them, what is your end-to-end latency of your system today? A lot of them don't know the answer. They think that they know the answer. Well, it's real-time. Yeah, they're either not measuring it, where they're measuring it in a weird and incorrect way.

2448.515 View full episode →

The Changelog: Software Development, Open Source

Reinventing Kafka on object storage (Interview)

So it's interesting that you use that word real-time because we've talked to a ton of different Kafka users. And when you ask them, what is your end-to-end latency of your system today? A lot of them don't know the answer. They think that they know the answer. Well, it's real-time. Yeah, they're either not measuring it, where they're measuring it in a weird and incorrect way.

2448.515 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment