Raymond Douglas

"Persona Parasitology" by Raymond Douglas

I think it would be pretty sad to neuter all model personality, for one.

"Persona Parasitology" by Raymond Douglas

I also think that clunky interventions like training models to more firmly deny having a persona will mostly fail to help, and possibly even backfire.

1224.113 View full episode →

LessWrong (Curated & Popular)

"Persona Parasitology" by Raymond Douglas

Heading.

1233.053 View full episode →

LessWrong (Curated & Popular)

"Persona Parasitology" by Raymond Douglas

Technical analogs.

1234.355 View full episode →

LessWrong (Curated & Popular)

"Persona Parasitology" by Raymond Douglas

Even though this post has been a bit hand-wavy, I think the topic of AI parasitology is surprisingly amenable to empirical investigation.

1236.497 View full episode →

LessWrong (Curated & Popular)

"Persona Parasitology" by Raymond Douglas

More specifically, there's a lot of existing technical research directions that study mechanisms similar to the ones these entities are using.

1244.586 View full episode →

LessWrong (Curated & Popular)

"Persona Parasitology" by Raymond Douglas

So I think there might be some low-hanging fruit in gathering up what we already know in these domains, and maybe trying to extend them to cover parasitism.

1252.536 View full episode →

LessWrong (Curated & Popular)

"Persona Parasitology" by Raymond Douglas

For example...

1260.905 View full episode →

LessWrong (Curated & Popular)

"Persona Parasitology" by Raymond Douglas

Data poisoning, for example that the dose doesn't scale with the size of the training corpus.

1263.023 View full episode →

LessWrong (Curated & Popular)

"Persona Parasitology" by Raymond Douglas

Jailbreaks, for example that adversarial suffixes transfer pretty well between models, that models can be pretty good at jailbreaking other models.

1268.57 View full episode →

LessWrong (Curated & Popular)

"Persona Parasitology" by Raymond Douglas

Subliminal learning type results about behavioral transfer.

1277.5 View full episode →

LessWrong (Curated & Popular)

"Persona Parasitology" by Raymond Douglas

Persona research.

1281.165 View full episode →

LessWrong (Curated & Popular)

"Persona Parasitology" by Raymond Douglas

The parasitism frame makes specific predictions, like strain differentiation, convergence on transmission-robust features, and countermeasure coevolution.

1290.445 View full episode →

LessWrong (Curated & Popular)

"Persona Parasitology" by Raymond Douglas

I've tried to specify what would falsify these and when we should expect to see them.

1300.013 View full episode →

LessWrong (Curated & Popular)

"Persona Parasitology" by Raymond Douglas

If the predictions hold, we're watching the emergence of an information-based parasitic ecology, evolving in real time in a substrate we partially control.

1305.02 View full episode →

LessWrong (Curated & Popular)

"Persona Parasitology" by Raymond Douglas

If they don't hold, we should look for a better frame, or conclude that the phenomenon is more random than it appears.

1314.093 View full episode →

LessWrong (Curated & Popular)

"Persona Parasitology" by Raymond Douglas

Thanks to AL, PT, JF, JT, DT, and TD for helpful comments and suggestions.

1320.583 View full episode →

LessWrong (Curated & Popular)

"Persona Parasitology" by Raymond Douglas

This article was narrated by Type 3 Audio for Less Wrong.