Brian O'Grady
๐ค SpeakerAppearances Over Time
Podcast Appearances
They are able to do things like tokenizing very well and you're able to do things like multilingual tokenizing and stemming.
But the idea is that when you try to introduce this concept of vectors, you're no longer utilizing the same, even like data structures to do your search.
You're now doing these very computationally heavy things
mathematical operations that require that are like cpu intensive right and they put a lot of stress on your existing search system because now rather than just sort of saying doing like a look up in like an inverted index to see oh which search results contain this keyword you're now saying oh um
let me do distance comparisons, right?
Which are very computationally intensive because you need to, let's say you have a vector that has 1024 dimensions.
That means that in a single vector, there are 1024 floating point numbers.
vectors, that's 1024 comparisons, right?
So what I often see happening is that these customers will try to leverage their existing search solution.
So elastic or open search to do these vector comparisons.
So they'll generate the vectors for all the pieces of text that they have in their sort of catalog.
They'll store those vectors next to the catalog item descriptions.
And then they'll try to do a vector search.
Now, vector search, as I said, is computationally intensive.
And if you said to yourself, oh, like, how do I do vector search over 1 million catalog items?
Your first attempt might be, well, let's say I have a text input.
I convert that into a vector, and I just compare it against all my catalog items.