Gavin Baker
π€ SpeakerVoice Profile Active
This person's voice can be automatically recognized across podcast episodes using AI voice matching.
Appearances Over Time
Podcast Appearances
to identify a small number of previously known minor vulnerabilities.
These vulnerabilities all appear relatively simple.
And we have found that other publicly available models are able to discover them as well without requiring a bypass.
So they're saying this isn't that big of a jailbreak.
This isn't that big of a deal.
We can move forward safely.
We've taken safety seriously.
But that is the debate that's going on between the admin and anthropic.
So as everyone knows, the rollout of Fable 5 has been a bit rocky.
Incredible demos, incredible benchmarks, but offset by the odd decision to silently degrade the quality of the answers related to frontier AI development instead of just refusing the request like cyber and bio prompts, which that was the main thing that everyone was really confused about.
The actual rationale behind AI development refusal is pretty sound.
People are upset about that if they're working on machine learning systems, recommender systems, anything that requires instantiating an AI product, let alone if you're just, hey, I'm just doing open source AI research and I'd love to use the latest and greatest models to help me.
Now it's not an option.
But at the same time, just think about the logic.
If a model doesn't let you hack a system,
just have the model build you an unrestricted model that does let you hack that system.
And so clearly in that scenario, it makes sense to restrict, if you want to restrict hacking or bio, you also have to restrict the tool that makes the tool that hacks the system or develops the bioweapon in theory.
And so, but the choice to silently degrade responses was not well received.
These are important issues.
And there's another timeline where AI leaders are cut from the same cloth as America's elected officials.