AI designs genomes from scratch & outperforms virologists at lab work. What could go wrong? | Dr Richard Moulange, CLTR
So to sum up the picture as I vaguely see it, there's a whole lot of things that we can do to try to improve refusal behavior that I imagine with a big push, we could maybe become, the closed source models like Claude could become quite robust against jailbreaks, quite unwilling to help with obvious production of bioweapons.
AI designs genomes from scratch & outperforms virologists at lab work. What could go wrong? | Dr Richard Moulange, CLTR
There we've got a challenge that it might be difficult to get all of the frontier models to have it because we're currently seeing like some companies compete on safety, some companies compete on speed and not having safety.
AI designs genomes from scratch & outperforms virologists at lab work. What could go wrong? | Dr Richard Moulange, CLTR
And so you could just have like, if you have like one model that is incredibly capable that has almost no refusal behavior, then well, it doesn't, you haven't helped all that much.
AI designs genomes from scratch & outperforms virologists at lab work. What could go wrong? | Dr Richard Moulange, CLTR
With the open source models, it's going to be possible always basically to fine tune them to get over any of this like reluctance that they have to help.
AI designs genomes from scratch & outperforms virologists at lab work. What could go wrong? | Dr Richard Moulange, CLTR
I suppose we could try to take the knowledge out of the training data so that it's not that they know how to do virology, but they have been told not to do it, is that they simply couldn't help you even if they wanted to.
AI designs genomes from scratch & outperforms virologists at lab work. What could go wrong? | Dr Richard Moulange, CLTR
But there you've got to challenge that the data that you would use to teach them virology probably is public, probably could be harvested off of the internet to a great extent.
AI designs genomes from scratch & outperforms virologists at lab work. What could go wrong? | Dr Richard Moulange, CLTR
So you train a new model that's like a smaller version of the other one, but you make sure that none of the information that goes from the original one to the second one includes anything about biology or virology.