Ryan Worrell
๐ค SpeakerAppearances Over Time
Podcast Appearances
So while we understand that we can't hit every possible application in the market with the shape that Workstream is today, we're pretty happy with the set of use cases and workloads that we can target because there are just so many of them out there and they happen to align with the budget-sensitive ones.
So while we understand that we can't hit every possible application in the market with the shape that Workstream is today, we're pretty happy with the set of use cases and workloads that we can target because there are just so many of them out there and they happen to align with the budget-sensitive ones.
So the writes are around 500 milliseconds at the P99. That's tunable. By default, we have the agent buffer set.
So the writes are around 500 milliseconds at the P99. That's tunable. By default, we have the agent buffer set.
the records that your clients are sending in memory for 250 milliseconds before writing them to object storage, so that you just write fewer files to object storage, which is the primary determinant of the cost of the object storage component of the system, if you're not retaining the data for very long.
the records that your clients are sending in memory for 250 milliseconds before writing them to object storage, so that you just write fewer files to object storage, which is the primary determinant of the cost of the object storage component of the system, if you're not retaining the data for very long.
But you can shrink that down all the way to 50 milliseconds, in which case then 10 latency, or sorry, the produced latency at that point would be probably ballpark 300 milliseconds at the P99.
But you can shrink that down all the way to 50 milliseconds, in which case then 10 latency, or sorry, the produced latency at that point would be probably ballpark 300 milliseconds at the P99.
For, I said end-to-end instead of read, because that's typically what people talk about in Kafka terms, because they wanna know like a producer sends a message, how long does it take until a consumer can consume that message successfully? So that's what I mean by end-to-end, and that is one to one and a half seconds of the P99 for most our users.
For, I said end-to-end instead of read, because that's typically what people talk about in Kafka terms, because they wanna know like a producer sends a message, how long does it take until a consumer can consume that message successfully? So that's what I mean by end-to-end, and that is one to one and a half seconds of the P99 for most our users.
So there really aren't that many downsides other than the latency. The latency is what actually enables all of the benefits of WarpStream, basically. The object storage is what enables a lot of the benefits. We have a couple of interesting features that are based on the fact that all of the data is in object storage. One of them we call agent groups.
So there really aren't that many downsides other than the latency. The latency is what actually enables all of the benefits of WarpStream, basically. The object storage is what enables a lot of the benefits. We have a couple of interesting features that are based on the fact that all of the data is in object storage. One of them we call agent groups.
And Azure Groups let you take one logical cluster and split it up physically amongst a bunch of different domains. They could be like different VPCs within the same cloud account. It could be different cloud accounts. They could be different cloud accounts or same cloud account, but across regions. all by just sharing the IAM role for the object storage bucket between those different accounts.
And Azure Groups let you take one logical cluster and split it up physically amongst a bunch of different domains. They could be like different VPCs within the same cloud account. It could be different cloud accounts. They could be different cloud accounts or same cloud account, but across regions. all by just sharing the IAM role for the object storage bucket between those different accounts.
The alternative to this with open source Kafka is like setting up something crazy like VPC peering, which is extremely hard to do. And your security team will probably not be super happy if you try to ask them to peer a bunch of VPCs together because it introduces more security risks.
The alternative to this with open source Kafka is like setting up something crazy like VPC peering, which is extremely hard to do. And your security team will probably not be super happy if you try to ask them to peer a bunch of VPCs together because it introduces more security risks.
So we have customers in production using this feature today, where the example that we usually give is there's a games company that splits their production games account, where all the game servers run, from the analytics account, where they do like the, so they run a bunch of flank jobs to process the data generated from the production account.
So we have customers in production using this feature today, where the example that we usually give is there's a games company that splits their production games account, where all the game servers run, from the analytics account, where they do like the, so they run a bunch of flank jobs to process the data generated from the production account.
And they run agents that just do produce, so just writes. They run that in the production account. And they run agents that just do fetch inside their analytics account. So they've kind of flexed the cluster across those two different environments. And all they had to do to set that up was share the IAM role on the object storage bucket instead of peering the VPCs together.
And they run agents that just do produce, so just writes. They run that in the production account. And they run agents that just do fetch inside their analytics account. So they've kind of flexed the cluster across those two different environments. And all they had to do to set that up was share the IAM role on the object storage bucket instead of peering the VPCs together.