Jeff Kao

👤 Speaker

514 total appearances

Appearances Over Time

Podcast Appearances

Rust in Production

Radar with Jeff Kao

So in a lot of ways, the FST is almost like a cache.

2215.128 View full episode →

Rust in Production

Radar with Jeff Kao

You can almost think of it as like a hash map or maybe even a B-tree map of a string to a U64.

2221.634 View full episode →

Rust in Production

Radar with Jeff Kao

Like that's how you would just, it's maybe a compressed version of that.

2228.019 View full episode →

Rust in Production

Radar with Jeff Kao

But it's even more like, more than that, it's really just like a way to cache high cardinality text in a very compressed way.

2232.583 View full episode →

Rust in Production

Radar with Jeff Kao

Right.

2300.093 View full episode →

Rust in Production

Radar with Jeff Kao

So I'll give a quick rundown of these.

2300.474 View full episode →

Rust in Production

Radar with Jeff Kao

So for any sort of search system, the most fundamental data structure is this thing called an inverted index.

2302.838 View full episode →

Rust in Production

Radar with Jeff Kao

So that implies like a forward index.

2310.07 View full episode →

Rust in Production

Radar with Jeff Kao

So maybe I'll explain what that is first.

2312.714 View full episode →

Rust in Production

Radar with Jeff Kao

And so like forward index is more like a traditional database.

2314.437 View full episode →

Rust in Production

Radar with Jeff Kao

So record one maps to Broadway, record two maps to Prince Street.

2317.022 View full episode →

Rust in Production

Radar with Jeff Kao

The inverted index sort of switches that around.

2322.571 View full episode →

Rust in Production

Radar with Jeff Kao

So you first tokenize, and that is its own whole topic.

2325.294 View full episode →

Rust in Production

Radar with Jeff Kao

People research on how to tokenize text, especially with all AI and machine learning trend now.

2330.08 View full episode →

Rust in Production

Radar with Jeff Kao

But you can then say, oh, Broadway, the token, maps to ID 1, and then Prince maps to Document 2, and Street maps to Document 2.

2335.947 View full episode →

Rust in Production

Radar with Jeff Kao

So when you type in Prince, then it's a...

2347.521 View full episode →

Rust in Production

Radar with Jeff Kao

I mean, it's not a hash map in these implementations, but essentially you just look up the word prints and then you get all the documents that are related.

2352.507 View full episode →

Rust in Production

Radar with Jeff Kao

And so there's, once you have these documents, you can sort of perform these set operations to essentially narrow down which documents are relevant.

2360.995 View full episode →

Rust in Production

Radar with Jeff Kao

And then there's, you know, Tantive offers this thing called BM25, which you can think of it as like TF-IDF.

2370.545 View full episode →

Rust in Production

Radar with Jeff Kao

But essentially, once you have these documents,

2377.732 View full episode →

← Previous Page 16 of 26 Next →

Report any issue