Rob Wiblin
π€ SpeakerVoice Profile Active
This person's voice can be automatically recognized across podcast episodes using AI voice matching.
Appearances Over Time
Podcast Appearances
And then it doesn't seem like it'll be that hard to train.
I guess if you think you've got the technical methods, then it shouldn't take that long to basically just train a model that can, as an alpha version, assign probabilities to statements being true or false.
Yeah.
That's exactly- That's the plan.
That's the plan.
That's the plan.
It's a lot better than what we have anyway.
As far as I can tell, I think Anthropic... I'm talking a lot about Anthropic because I've been reading the Mythos system card and all the announcement last week.
As far as I can tell, they've decided to have Mythos monitoring Mythos, basically.
They've tried other models doing it, but they're like, Mythos is smarter, it's better.
But obviously this creates a...
internal contradiction that if they don't trust mythos, why do they trust mythos to monitor itself?
That's one reason why I'm like, even if this model is much less intelligent, I'm like, at least it's an independent judge.
It's built a very different way that might be more likely to flag things and less likely to scheme to support itself, basically.
uh if we're going to be doing ai research with ai we really really want to make sure that ai is going to be a honest one so i think the majority of people who are piling into technical ai safety they've decided to go for improving our chances of the cat and mouse game basically and i think for the people who are very concerned about what's going on that their reasoning is something like we're running a 50 chance of absolute disaster now because we're doing a whole bunch of like absolutely crazy reckless stuff
maybe just like patching the very dumbest stuff that we're doing, like fixing the worst fires, the worst things that are on fire right now, we can bring that risk down to 10%.
Obviously, that's a preposterous risk of disaster to run.
It's an embarrassment to us as a species that we can't do better than that.
Nevertheless, it's 40 percentage points improvement in our chances of things going well, or at least reasonably.
going from 10% down to 0% is only a quarter as valuable as that.