Narrator (TYPE III AUDIO)
👤 SpeakerAppearances Over Time
Podcast Appearances
Either right now, or soon enough, asterisk, also you can frame all of the above as some sort of nefarious collusion, but I don't think that's the right thing to do.
Heading Implications for alignment research and policy
The familiar human sense of a coherent, stable, bounded self simply doesn't match reality.
Arguably, it doesn't even match reality well in humans, but with AIs, the mismatch is far greater.
I think in principle, many people would agree, don't anthropomorphize LLMs.
But in my experience this is way easier to say than to actually do.
Human-based priors creep in, and mammal priors about identities and individualities run deep.
My hope is that weird as it may sound, metaphors like Pando or Mycelium or Tulkus can help.
Not at the level write a paper about reincarnation in LLMs, but at the level of intuitions about the minds we may meet.
Practically, researchers should stress-test alignment ideas across diverse notions of self.
If your safety strategy depends heavily on assumptions about clear agentic boundaries, run experiments or thought exercises challenging those boundaries.
Does your method still hold if AI identities blur, merge, or emerge spontaneously?
Are you talking about AI characters, predictive ground, weights, model families?
What happens if implicit coordination arises between AI's sharing training patterns, even without explicit communication?
How robust are your conclusions if there's no stable self at all?
A simple mental checkpoint.
When reasoning about alignment, ask, does this conclusion survive different individuality assumptions or even no-self assumptions?
What would help more?
Having a better theory of agency, call it hierarchical, coalitional or collective.
A bunch of AIs helped with writing this post.