Kevin Rose
π€ SpeakerAppearances Over Time
Podcast Appearances
So they're actually stored in your database as those pieces of math.
And when you apply some of these clustering algorithms on top of them, they get really good at nuanced information where keyword search would completely fall down.
So the old school ways back in the day in 2004 when I launched, you know,
If you search for Apple releases whatever, it's looking for the word Apple, it's looking for release, it's looking for whatever it was back then that they were doing, like an iPod or something, right?
And it just does it based on just, can I find that text in the database?
And if so, show me back the article.
The beautiful thing about what we have today...
with our understanding of linguistics and around using vector embeds and algorithms on top of that is that you can say there is a difference, even though they both have the same type of keywords, but there is a huge difference between Apple sues Google and Google sues Apple.
And that is impossible to do with keyword search because you're not understanding at a deep level what's going on here, right?
Right.
So anyway, this is a very rich, purposely rich, longer form version of the TLDR used just for vector embeddings.
And then I also wanted to create some key points here that we can use to feed into other models later when we're comparing the difference between articles, when we see multiple articles starting to get clustered together.
And then I don't use this, but I asked AI like, hey, write me like a spicier title, like a title that people might click on more, find more interesting, and just to kind of rewrite the title.
And TechMeme does this too.
Like when you go to the front page of TechMeme, it's not the title of the article, it's like actually what their editors chose to write.
I just wanted to see how this looked.
And then I wanted to put it in one of three different categories, tech core and a couple other categories, largely because there's a lot of stuff that come through these tech feeds, especially when you add in like Forbes and some of these others where it's like, you know, does not relate to actual core tech or AI or the things I care about.
And I just want to put them in a bin.
And so here's the embedding.
I'm using the large model from OpenAI.