Jeff Kao
๐ค SpeakerAppearances Over Time
Podcast Appearances
And I think there's an engineer who works there now.
I only know, I only remember his, his GitHub name because it's very memorable, like burnt sushi, but he works at UV if, if I remember correctly, and he's come up with a lot of really interesting rust crates.
And I think he even implemented the regex or had the, did the regex implementation of rust.
And yeah, so we make a lot of use of his libraries, but FST is essentially a character graph.
And so there's this concept of like a try that's very typically used for prefix queries.
And so you can sort of think of an FST as like a try where, you know, all the prefixes that are shared compress, but it also compresses the suffixes.
So now you have almost like double compression, right?
And essentially doing text lookups is a traversal of the graph.
And you can see how that sort of primitive lets you do many things.
It lets you implement like a regex because the regex sort of works the same way.
And so for fuzzy search, you can implement like Levenstein distance by essentially keeping away of dropping or like if you type something incorrectly,
you can have like certain thresholds that, you know, we, we use like an edit distance of one because more than that, it's, it's very, it's essentially combinatorially expensive.
And we do, we have other techniques to, for, for typo tolerance, but we use this library essentially for, uh,
One, obviously prefix search, but then two, it does have some built-in ways to do Levenstein distance, which we're actually starting to move off of a little bit because now we're actually, you know, as the project is starting to succeed, we're getting more queries.
And so because you know what users typed and what their end result actually was, we're able to, you know, at scale sort of figure out
oh, these typos actually map to this type of, this actual correct spelling.
And so we're able just to do those lookup much quicker.
But the funny part is for some of these, like for some of these cases, we actually also index that into the FST.