Steve Hsu
๐ค SpeakerAppearances Over Time
Podcast Appearances
That could be a driverless vehicle.
That could be an automated customer service device.
You need to understand what the tail risks are.
You don't just want to turn this thing on and then discover like, oh, overnight it got into some bad loop and pissed off 100,000 customers, right?
You just can't have that, right?
Now you're more fault tolerant because if you're just like, oh, this is just my productivity.
So I'll let the model make a few decisions about this.
And then I come back and I'm like, yeah, it didn't do a good job.
Just scrap that.
In enterprise production, you can't, unless the purpose is research.
If the purpose is like something that really matters to a customer, cannot take those risks.
So there's a lot of what we do, which is almost like statistical rigor, like designing a test system, running agents through that test system, characterizing what happens, showing it to our customer.
There's a lot of stuff like that that the general public doesn't think of at all when they think about AI.
And with autonomous vehicles, of course, you know, like the failure mode is like there's a crash, there's a death, there's a multimillion dollar lawsuit.
And so, you know, that's like an extreme case.
But even in customer service, you don't want cases where the AI, you know, pisses off someone.
A byproduct of trying to be rigorous about how well the AI performs means you're also being rigorous about how the humans are performing in the same task.
And you often learn shocking things that the managers are very shocked by, like how often their agents give the wrong information, human agents give the wrong information to the customers.
I use the Waymos whenever I'm in the Bay Area and in San Francisco, and I've had one, not
really bad failure, but I've had failures.