Jeff Kao
๐ค SpeakerAppearances Over Time
Podcast Appearances
So the median latency is about 50 milliseconds.
And that's because when we were building this system and we were seeing customer queries, we
we're seeing that most customers tend to enter the right thing more or less.
Like they tend to enter their address correctly, even if there's like some spelling mistakes, but they tend to...
have a more predictable sort of query pattern.
And that's sort of why, like I described the FST as like a high cardinality text cache, because almost everything, like I would say 70 to, it depends on like the, what you're querying, but 70 to 80% of the queries, especially for US and Canada addresses, will just hit the FST.
And all of that, we have the sort of generosity of being able to store all of that in memory because it's so compressed.
And, you know, Rust, we're writing in a way that's a single process and multi-threading for concurrency versus something like Node, which, you know, concurrency or like Python, where, I mean, there are starting to be thread concepts, but the classical way is just to have multiple processes.
And then that means you need multiple copies of the data structure and memory.
And so oftentimes we're hitting that.
those at least that side of things those operations are usually at most like single digit milliseconds depends on like the query but you know a lot of times we will see things show up in less than millisecond for that side of things for reverse geocoding as i mentioned we sort of
sort of you know like to speed up your your algorithms oftentimes it's like can you sort it or can you hash it and essentially we have hashed it by taking the s2 library where we convert the latitude and longitude two dimensions into one dimension and that's essentially just a key that you look up in in rocks db so many of our workloads look like that and you once it's like that it's almost no different from like a primary key lookup right you just it's just a key value lookup
So we're seeing less than a millisecond for those types of workloads.
Right, so in terms of our code base, it's about 150,000 lines, which I would say is actually not very much.
And I haven't taken the time to, like, analyze, like, which code line is serving which use case.
But I will also say, like, there is, like, a good percentage of stuff that's, like, kind of hard-coded right now that we could probably... I think to get, like, a minimal implementation, maybe, like, you could...
When we first started doing address validation, it was like a couple thousand lines.