Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Pricing

Jeff Kao

๐Ÿ‘ค Speaker
514 total appearances

Appearances Over Time

Podcast Appearances

Rust in Production
Radar with Jeff Kao

So in a lot of ways, the FST is almost like a cache.

Rust in Production
Radar with Jeff Kao

You can almost think of it as like a hash map or maybe even a B-tree map of a string to a U64.

Rust in Production
Radar with Jeff Kao

Like that's how you would just, it's maybe a compressed version of that.

Rust in Production
Radar with Jeff Kao

But it's even more like, more than that, it's really just like a way to cache high cardinality text in a very compressed way.

Rust in Production
Radar with Jeff Kao

So I'll give a quick rundown of these.

Rust in Production
Radar with Jeff Kao

So for any sort of search system, the most fundamental data structure is this thing called an inverted index.

Rust in Production
Radar with Jeff Kao

So that implies like a forward index.

Rust in Production
Radar with Jeff Kao

So maybe I'll explain what that is first.

Rust in Production
Radar with Jeff Kao

And so like forward index is more like a traditional database.

Rust in Production
Radar with Jeff Kao

So record one maps to Broadway, record two maps to Prince Street.

Rust in Production
Radar with Jeff Kao

The inverted index sort of switches that around.

Rust in Production
Radar with Jeff Kao

So you first tokenize, and that is its own whole topic.

Rust in Production
Radar with Jeff Kao

People research on how to tokenize text, especially with all AI and machine learning trend now.

Rust in Production
Radar with Jeff Kao

But you can then say, oh, Broadway, the token, maps to ID 1, and then Prince maps to Document 2, and Street maps to Document 2.

Rust in Production
Radar with Jeff Kao

So when you type in Prince, then it's a...

Rust in Production
Radar with Jeff Kao

I mean, it's not a hash map in these implementations, but essentially you just look up the word prints and then you get all the documents that are related.

Rust in Production
Radar with Jeff Kao

And so there's, once you have these documents, you can sort of perform these set operations to essentially narrow down which documents are relevant.

Rust in Production
Radar with Jeff Kao

And then there's, you know, Tantive offers this thing called BM25, which you can think of it as like TF-IDF.

Rust in Production
Radar with Jeff Kao

But essentially, once you have these documents,