Ruby
👤 SpeakerAppearances Over Time
Podcast Appearances
Concerns typically split into abuse of a powerful technology by ill-intentioned actors, for example a dictatorial regime, or loss of control where the AI systems themselves go rogue.
Many AI leaders are on record acknowledging the danger of AI.
Could be lights out for all of us, said Altman regarding worst-case scenario.
Anthropic, OpenAI, and Google DeepMind all contain safety departments whose purpose is to keep AI safe.
Until now, it would be possible to doubt these efforts as safety washing akin to the greenwashing of companies like ExxonMobil, designed to placate employees, regulators, and the public.
After all, the safety efforts to date have not prevented the relentless march of AI progress.
That's a harder story to tell when it costs you $200 billion, if not everything, says Sarah Chen of Bernstein Research.
People are scratching their heads to understand the PR stunt, but it really doesn't add up.
They could announce they're resuming next week and it wouldn't undo the damage they've done.
So why?
The industry and world are hunting for answers.
Anthropic's official statement is measured.
Internal evaluations revealed that our current safety techniques are not yet adequate for models at this capability level.
Sources closer to the company paint a more alarming picture.
A contact speaking on condition of anonymity says concerns spread within the company when their latest clawed model appeared to defy its constitution.
The constitution is a document used to shape Anthropic's AI to be an honest, harmless, and helpful assistant that is ethically grounded.
A recent leak revealed the existence of a new vastly more powerful clawed model called Mythas.
They found substantial evidence that the constitution was adhered to at a surface level, but that the model had its own drive and personality at a deeper level that did not conform to expectations for Claude, and attempts to change this had not worked.
A different source also speaking on condition of anonymity had a different and more disturbing explanation.
The reason for pause wasn't the wrong personality and power, but many of the safety techniques involved using weaker or cheaper AI models to monitor more powerful ones, for example, detecting whether inputs or outputs violate rules, were ineffective on the latest model.