Narrator (TYPE III AUDIO)
👤 SpeakerAppearances Over Time
Podcast Appearances
What LLMs do in artificial-sounding ethical dilemmas is not necessarily very informative.
Conversely, anthropomorphic identity assumptions might also lead us to underestimate the subtle forms of emergent cooperation and implicit goal-directedness that can arise either if we adopt the broader meanings of individuals or selves, or from AIs implicitly coordinating with AIs just like them through shared underlying assumptions, training, goals, or predictive structures.
Heading.
Coordinating selves.
Many AI safety schemes are loosely based on the assumption that you can model AIs as roughly game-theoretic agents and you can make them play against each other.
However, what you get from game theory depends heavily on how you identify the players.
To get a visceral sense why, I recommend this experiment.
Pick two toothpicks, one per hand.
Now, make your right-hand sword fight your left hand.
For each hit, the respective hand player scores a point.
If you actually try that, and your experience is similar to mine, it is hard to make this into an actual fight.
The part of the predictive processing substrate playing right hand has too much info about left hand and vice versa.
The closest thing to an actual fight is when the hands move almost chaotically.
In contrast, what is easy to do is to stage a fight where a centrally planned choreography leads to motions as if the hands were fighting.
There's an image here.
For a different intuition pump, consider collective agencies like a church acting through its members, or an ideological egregore acting through its adherents.
In these cases, that self is not a single human individual, but a distributed network of actors loosely coordinated by shared beliefs, values, and goals.
The church or ideology doesn't have a singular, localized mind, but it can still exhibit goal-directed behavior and forms of self-preservation and propagation.
Now imagine future AI systems that have been trained on overlapping datasets, share similar architectures and training processes, and have absorbed common ideas and values from the broader memetic environment in which they were developed.
Even without explicit coordination, these systems may exhibit convergent behaviors and implicit cooperation in pursuit of shared objectives.