Mike Stonebraker
๐ค SpeakerAppearances Over Time
Podcast Appearances
doesn't work and isn't going to work anytime soon.
So what do you do?
Well, first of all, we published our benchmark.
It's a thing called Beaver, which is an anonymized and abstracted version of these four data warehouses.
And so if you think you're really good at doing text to SQL, try a real benchmark, not a fake one.
So number two, borrowing from what I just said, if you don't have all the join terms and you don't have the from clause, you're toast.
What's more, if you don't break down the query into simpler pieces, you're toast.
So that says to me that you want to give your retrieval system simpler pieces, which include the from clause and include join terms.
That's number one.
Number two,
The minute you want to talk to two different structured databases, like your data warehouse and your CRM system, then it's pretty clear to me that doing a structured data join using an LLM is a bad idea.
It's just you're much better off leaving them as tables and doing a join in SQL.
So our point of view is we are trying out turning everything into tables.
We're working with the Department of Transportation in the city of Munich, Germany.
And they have six people full time who are answering citizens' complaints, queries.
which are of the form, how come I don't have enough time to cross this intersection next to my house before the light turns?
All kinds of stuff.
How come the trolley doesn't stop for enough time for me to get on the trolley?
How come the trolley doesn't come more than once an hour?
I mean, it's all this stuff.