Jared
๐ค SpeakerAppearances Over Time
Podcast Appearances
We are told to ask GPT-40 to grade GPT-3.5.
We are told to fix the vibes, but this creates a dangerous circular dependency.
If the underlying models suffer from sycophancy, which is agreeing with the user, or hallucination, a judge model often hallucinates a passing grade.
We are trying to fix probability with more probability.
That is a losing game."
One possible way of dealing with these confident idiots we've introduced into our software stacks the last few years is to stop treating agents like magic boxes and start treating them like software, hence the Steer SDK was created.
Quote, Steer is an open source Python library that intercepts agent failures, such as hallucinations, bad JSON, PII leaks, etc., and allows you to inject fixes via a local dashboard without changing your code.
End quote.
Another way of dealing with these confident idiots in our software stacks is remove them.
But that might not be possible anymore.
Bunn is joining Anthropic.
The company behind Bunn, which is the open source runtime for Cloud Code, is joining Anthropic.
We discussed the big acquisition slash aqua hire on last week's Friends episode, but at the time I hadn't quite considered this move and how contrary it is to Anthropic's party line that AI agents are replacing software engineers.
From Anthropic's announcement, quote,
and it directly drove the recent launch of Cloud Code's native installer.
We know the Bunn team is building from the same vantage point that we do at Anthropic, with a focus on rethinking the developer experience and building innovative, useful products.
End quote.
Bunn is open source.
Why not just fork it and have a Cloud Code-powered engineer make all the necessary changes and upgrades to the runtime that Anthropic needs?
Perhaps because there's no getting there from here.