Ryan Worrell

The Changelog: Software Development, Open Source

Reinventing Kafka on object storage (Interview)

Making your application work on that first and then trying to add object storage on top is a very popular thing that people try to do. They always call it tiered storage. Basically, every system that has that calls it tiered storage. And it's very hard to match the characteristics of those two things together going top down.

2023.323 View full episode →

The Changelog: Software Development, Open Source

Reinventing Kafka on object storage (Interview)

Making your application work on that first and then trying to add object storage on top is a very popular thing that people try to do. They always call it tiered storage. Basically, every system that has that calls it tiered storage. And it's very hard to match the characteristics of those two things together going top down.

2023.323 View full episode →

The Changelog: Software Development, Open Source

Reinventing Kafka on object storage (Interview)

Whereas going bottom up the other direction, starting with object storage and then layering stuff on top, it seems like it should be the same, but it's not. You don't end up making the same design decisions along the way. And that has a big influence on the overall characteristics of the system. And I can explain specifically what that means for Kafka in terms of tiered storage.

2042.316 View full episode →

The Changelog: Software Development, Open Source

Reinventing Kafka on object storage (Interview)

Whereas going bottom up the other direction, starting with object storage and then layering stuff on top, it seems like it should be the same, but it's not. You don't end up making the same design decisions along the way. And that has a big influence on the overall characteristics of the system. And I can explain specifically what that means for Kafka in terms of tiered storage.

2042.316 View full episode →

The Changelog: Software Development, Open Source

Reinventing Kafka on object storage (Interview)

So they were thinking about disks first, like local NVMe SSDs. That's usually what people are running on these days in the cloud. The way that that influences the design is that the way that they implement tiered storage is they just take those log files on disk that have all the records in them, and they copied them over to object storage. That solves a cost problem.

2064.064 View full episode →

The Changelog: Software Development, Open Source

Reinventing Kafka on object storage (Interview)

So they were thinking about disks first, like local NVMe SSDs. That's usually what people are running on these days in the cloud. The way that that influences the design is that the way that they implement tiered storage is they just take those log files on disk that have all the records in them, and they copied them over to object storage. That solves a cost problem.

2064.064 View full episode →

The Changelog: Software Development, Open Source

Reinventing Kafka on object storage (Interview)

If you never want to read that data again, you're good. That's cool. It's much cheaper now. When you want to come back and read it, let's say that you wanted to read all of it, like all of the data you've ever tiered off into storage, the way that that works in the open source project is that you'll end up reading all of that data you're going to have to pull back through one of the brokers.

2087.865 View full episode →

The Changelog: Software Development, Open Source

Reinventing Kafka on object storage (Interview)

If you never want to read that data again, you're good. That's cool. It's much cheaper now. When you want to come back and read it, let's say that you wanted to read all of it, like all of the data you've ever tiered off into storage, the way that that works in the open source project is that you'll end up reading all of that data you're going to have to pull back through one of the brokers.

2087.865 View full episode →

The Changelog: Software Development, Open Source

Reinventing Kafka on object storage (Interview)

There's no way for you to parallelize that processing because they just view it as this bunch of log files that I put into object storage. And with Orbstream, we've kind of decoupled the idea of the local storage being owned by one machine to now there's a metadata layer that says, these are all the files that exist.

2113.247 View full episode →

The Changelog: Software Development, Open Source

Reinventing Kafka on object storage (Interview)

There's no way for you to parallelize that processing because they just view it as this bunch of log files that I put into object storage. And with Orbstream, we've kind of decoupled the idea of the local storage being owned by one machine to now there's a metadata layer that says, these are all the files that exist.

2113.247 View full episode →

The Changelog: Software Development, Open Source

Reinventing Kafka on object storage (Interview)

And then we have all these stateless agent things that can actually pull the data out of object storage for you. So you can scale up and down. as quickly as you need to to read all that data out of object storage. So you wanted to pull it all out. You can scale up temporarily for the hour that you want to run some big batch job and then scale back down at the end.

2136.562 View full episode →

The Changelog: Software Development, Open Source

Reinventing Kafka on object storage (Interview)

And then we have all these stateless agent things that can actually pull the data out of object storage for you. So you can scale up and down. as quickly as you need to to read all that data out of object storage. So you wanted to pull it all out. You can scale up temporarily for the hour that you want to run some big batch job and then scale back down at the end.

2136.562 View full episode →

The Changelog: Software Development, Open Source

Reinventing Kafka on object storage (Interview)

With the open source tiered storage in Kafka, that's a lot harder because they started with the local disk part, which makes sense because that's what existed before. It just means that adding stuff on afterwards, you're usually the tiered storage, lower layers of storage is like a secondary concern. It doesn't get as much love and attention

2155.275 View full episode →

The Changelog: Software Development, Open Source

Reinventing Kafka on object storage (Interview)

With the open source tiered storage in Kafka, that's a lot harder because they started with the local disk part, which makes sense because that's what existed before. It just means that adding stuff on afterwards, you're usually the tiered storage, lower layers of storage is like a secondary concern. It doesn't get as much love and attention

2155.275 View full episode →

The Changelog: Software Development, Open Source

Reinventing Kafka on object storage (Interview)

as the primary storage gets, and you end up with a very different system at the end.

2174.386 View full episode →

The Changelog: Software Development, Open Source

Reinventing Kafka on object storage (Interview)

as the primary storage gets, and you end up with a very different system at the end.

2174.386 View full episode →

The Changelog: Software Development, Open Source

Reinventing Kafka on object storage (Interview)

Yeah. So Kafka has, let's start with topics. Topics are basically just a name for mapping consumers and producers together. They agree on the name of a topic for how they're going to where they're going to send the data to and where they're going to consume the data from. And within a topic, there are partitions. And a partition is basically just a shard to make that topic scalable.

2201.488 View full episode →

The Changelog: Software Development, Open Source

Reinventing Kafka on object storage (Interview)

Yeah. So Kafka has, let's start with topics. Topics are basically just a name for mapping consumers and producers together. They agree on the name of a topic for how they're going to where they're going to send the data to and where they're going to consume the data from. And within a topic, there are partitions. And a partition is basically just a shard to make that topic scalable.

2201.488 View full episode →

The Changelog: Software Development, Open Source

Reinventing Kafka on object storage (Interview)

There are a lot of different ways to decide which shard you're going to write the data to. But let's just say, for now, you do it by hashing the key of the message and then routing it to the shard based on the hash of that key. So if you have the record with the same key, you'll end up going to that same broker every time that owns that partition. So that's how it works in the open source product.

2224.521 View full episode →

The Changelog: Software Development, Open Source

Reinventing Kafka on object storage (Interview)

There are a lot of different ways to decide which shard you're going to write the data to. But let's just say, for now, you do it by hashing the key of the message and then routing it to the shard based on the hash of that key. So if you have the record with the same key, you'll end up going to that same broker every time that owns that partition. So that's how it works in the open source product.

2224.521 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment