Menu
Sign In Search Podcasts Libraries Charts People & Topics Add Podcast API Blog Pricing

Mike Stonebraker

๐Ÿ‘ค Speaker
444 total appearances

Appearances Over Time

Podcast Appearances

The Peterman Pod
Turing Award Winner: Postgres, Disagreeing with Google, Future Problems | Mike Stonebraker

We've been trying to make it work on real-world databases.

The Peterman Pod
Turing Award Winner: Postgres, Disagreeing with Google, Future Problems | Mike Stonebraker

especially real-world data warehouses.

The Peterman Pod
Turing Award Winner: Postgres, Disagreeing with Google, Future Problems | Mike Stonebraker

So we've been trying the technology on four different production databases, warehouses, where we've gotten the workload, the actual workload that's run, and from the actual users using the system,

The Peterman Pod
Turing Award Winner: Postgres, Disagreeing with Google, Future Problems | Mike Stonebraker

And we've gotten them to reverse engineer the text that corresponds to that sequel.

The Peterman Pod
Turing Award Winner: Postgres, Disagreeing with Google, Future Problems | Mike Stonebraker

So we have text and sequel for, we have four benchmarks.

The Peterman Pod
Turing Award Winner: Postgres, Disagreeing with Google, Future Problems | Mike Stonebraker

Tell me all the professors at MIT who won the Turing Award.

The Peterman Pod
Turing Award Winner: Postgres, Disagreeing with Google, Future Problems | Mike Stonebraker

And so an LLM is supposedly good at that.

The Peterman Pod
Turing Award Winner: Postgres, Disagreeing with Google, Future Problems | Mike Stonebraker

And so the text to SQL benchmarks, there's one called Spider, another one called Bird.

The Peterman Pod
Turing Award Winner: Postgres, Disagreeing with Google, Future Problems | Mike Stonebraker

And the best LLM systems are pretty good at those benchmarks, like 80% accuracy or better.

The Peterman Pod
Turing Award Winner: Postgres, Disagreeing with Google, Future Problems | Mike Stonebraker

Not superhuman, but they're pretty good.

The Peterman Pod
Turing Award Winner: Postgres, Disagreeing with Google, Future Problems | Mike Stonebraker

Like, you would consider using them.

The Peterman Pod
Turing Award Winner: Postgres, Disagreeing with Google, Future Problems | Mike Stonebraker

And, you know, like the current leaderboard is something like 85% accuracy, which, I mean, it's getting there.

The Peterman Pod
Turing Award Winner: Postgres, Disagreeing with Google, Future Problems | Mike Stonebraker

You say maybe it's not quite ready for prime time, but it certainly looks pretty good.

The Peterman Pod
Turing Award Winner: Postgres, Disagreeing with Google, Future Problems | Mike Stonebraker

Well, on our benchmarks, large language models get 0%.

The Peterman Pod
Turing Award Winner: Postgres, Disagreeing with Google, Future Problems | Mike Stonebraker

And if you enhance them with RAG and all the tricks, it goes to 10%.

The Peterman Pod
Turing Award Winner: Postgres, Disagreeing with Google, Future Problems | Mike Stonebraker

And if you give as a prompt the from clause, in other words, all the actual tables that need to be accessed, and all the actual join clauses that need to be joined, then accuracy goes to about 35%.

The Peterman Pod
Turing Award Winner: Postgres, Disagreeing with Google, Future Problems | Mike Stonebraker

So the definition of this stuff is not ready for prime time and not going to be for a while, if ever.

The Peterman Pod
Turing Award Winner: Postgres, Disagreeing with Google, Future Problems | Mike Stonebraker

So what's the difference?

The Peterman Pod
Turing Award Winner: Postgres, Disagreeing with Google, Future Problems | Mike Stonebraker

Number one, LLMs are trained on the pile.

The Peterman Pod
Turing Award Winner: Postgres, Disagreeing with Google, Future Problems | Mike Stonebraker

Data warehouse data is not in the pile.