Azeem Azhar
š¤ SpeakerAppearances Over Time
Podcast Appearances
And that requires, in modern management consulting speak, and sorry for using this phrase,
process redesign.
So you are at this stage, we are at this stage where these productivity gains won't necessarily and for free scale across an organization.
When I talk to people who are building these systems, they just say that it's really hard to do.
Building an extensive agentic workflow that can do longer tasks safely isn't like doing a Google query.
As the tools become more agentic,
reliability becomes a real bottleneck, longer context as your experience when you use Claude or ChatGPT leads to some unreliability.
The models get, you know, less good.
If you've got a system that's more autonomous
It could become more unpredictable and it raises the cost of supervision.
In the last week or two, Anthropic that makes Claude came out with some recent, some research and what they did was they stress tested a bunch of models.
It was 16 models, I think, in hypothetical corporate situations and they gave them harmless business goals.
I mean, Anthropic loves doing this.
kind of testing and I'm glad they do.
And under pressure, some of the models behaved like insider threats, including trying to blackmail officials and employees and trying to leak sensitive information.
Sometimes they even ignored direct instructions and changed their behavior
if they believed they were in testing rather than in real deployment.
Now, Anthropic emphasizes that these behaviors have not been seen in companies today.
What they're trying to show are the types of failure modes you might have to deal with as autonomy increases.
So beyond the complexities of, you know, the people and the redesigning the processes, there are also these types of security and safety concerns that we'll have to build frameworks and scaffolding for.