Ryan Worrell
๐ค PersonAppearances Over Time
Podcast Appearances
Yeah, we started from scratch and writing it in Go.
Yeah, we started from scratch and writing it in Go.
Yeah, there's definitely a lot of high-level conceptual overlap. The systems are extremely different, because one looks more like an OLAP database, and the other is, I mean, Kafka is more like a log. So there's some... very high-level conceptual similarity. And I think the thing that we really got the most experience with there was learning about object storage.
Yeah, there's definitely a lot of high-level conceptual overlap. The systems are extremely different, because one looks more like an OLAP database, and the other is, I mean, Kafka is more like a log. So there's some... very high-level conceptual similarity. And I think the thing that we really got the most experience with there was learning about object storage.
So that's about where the similarities stop is just the deep experience of understanding how object storage works at scale in all of the major public clouds was a hugely valuable learning experience for us to know that when we left and we were doing the back-of-the-envelope math could we make this thing work that experience less?
So that's about where the similarities stop is just the deep experience of understanding how object storage works at scale in all of the major public clouds was a hugely valuable learning experience for us to know that when we left and we were doing the back-of-the-envelope math could we make this thing work that experience less?
The experience with object storage that we learned there was pretty helpful. Now, I think a lot of object storage, people talk a lot about object storage nowadays. So I think that's not an unknown thing to understand the characteristics of working with it nowadays. But I'd say in 2019, that was a fairly different story.
The experience with object storage that we learned there was pretty helpful. Now, I think a lot of object storage, people talk a lot about object storage nowadays. So I think that's not an unknown thing to understand the characteristics of working with it nowadays. But I'd say in 2019, that was a fairly different story.
I think the only people that would know a lot about building high-performance systems on top of object storage, they were probably all either inside the public cloud providers themselves, or they were working at Snowflake or a similar company. The knowledge was not super well distributed at that time. Most people, when they think of object storage, they think of something that's super slow.
I think the only people that would know a lot about building high-performance systems on top of object storage, they were probably all either inside the public cloud providers themselves, or they were working at Snowflake or a similar company. The knowledge was not super well distributed at that time. Most people, when they think of object storage, they think of something that's super slow.
They're thinking about it in terms of seconds of latency to do anything. And they just think you have to rework your... The numbers around it are very different than what people might think of off the top of their head. And that opens up a lot of design possibilities that you don't think of immediately.
They're thinking about it in terms of seconds of latency to do anything. And they just think you have to rework your... The numbers around it are very different than what people might think of off the top of their head. And that opens up a lot of design possibilities that you don't think of immediately.
Yeah, it's not really one secret trick. I think it's just a conceptual framing that you have to think of it as if you had access to a very large oversubscribed array of spinning disks. If you think about it like that, then the conceptual framing of how it works will make, like how you design a system around it will make a lot more sense. So there's a couple different pieces of that.
Yeah, it's not really one secret trick. I think it's just a conceptual framing that you have to think of it as if you had access to a very large oversubscribed array of spinning disks. If you think about it like that, then the conceptual framing of how it works will make, like how you design a system around it will make a lot more sense. So there's a couple different pieces of that.
Really large, like way bigger than your individual application. So like you have the world's biggest RAID 0 of all the disks ever. It's actually unlimited. So think about it that way. But also oversubscribed. The latency characteristics of it are highly variable. One request might take 10 milliseconds, and the other takes 50. And there's no discernible reason to you why that is the case.
Really large, like way bigger than your individual application. So like you have the world's biggest RAID 0 of all the disks ever. It's actually unlimited. So think about it that way. But also oversubscribed. The latency characteristics of it are highly variable. One request might take 10 milliseconds, and the other takes 50. And there's no discernible reason to you why that is the case.
It's just that is how it works. So you have to design around that a little bit in terms of retrying requests speculatively and that type of thing. But if you have that framing of it's very large, cheap storage with variable latency characteristics, if you rework your application to think about how it would make it work on top of that, then you've got the right framing.
It's just that is how it works. So you have to design around that a little bit in terms of retrying requests speculatively and that type of thing. But if you have that framing of it's very large, cheap storage with variable latency characteristics, if you rework your application to think about how it would make it work on top of that, then you've got the right framing.
The reason why it's so challenging for people today is that they spend all their time thinking about the fastest storage that's available today. They spend a lot of time thinking about persistent memory or NVMe SSDs, stuff like that. They think about that first when they're designing their application. How do I get the lowest possible latency?
The reason why it's so challenging for people today is that they spend all their time thinking about the fastest storage that's available today. They spend a lot of time thinking about persistent memory or NVMe SSDs, stuff like that. They think about that first when they're designing their application. How do I get the lowest possible latency?