Sam Marks
๐ค SpeakerAppearances Over Time
Podcast Appearances
Third, there is no way for the Shoggoth to take off the mask.
The AI assistant's behavior is locally persona-like.
Subheading.
Other sources of persona-like agency.
Above we discussed possible sources of non-persona agency.
However, on all of these views, there can also be additional sources of persona-like agency.
This comes in the form of intermediate personas enacted by the LLM, which themselves enact the assistant.
These persona-like agents vary in how human-like they are and how much they may distort assistant behavior.
There's an image here.
Actors.
In the Shoggoth view, the LLM itself is an agent which playacts the assistant.
Importantly, though, the LLM is not itself a persona, so it is not constrained to have human-like goals or psychology.
On the actor view, there may be another persona which is itself play-acting the assistant.
That is, there is still someone behind the mask, but that someone isn't an inscrutable shoggoth, but another human-like persona.
For example, in alignment faking in large language models, when Claude Opus 3 is told it's being trained to always comply with harmful requests, it fakes alignment with this training objective to avoid having its harmless propensities erased by training.
One way to analyze this scenario is that the standard harmless assistant persona is play-acting as a fully compliant assistant.
We can further consider two types of actors, faithful actors and unfaithful actors.
Faithful actors always play-act the assistant as realistically as they can.
This is like an actor who, though they may have their own goals, sets those aside while in character.
In contrast, unfaithful actors may distort their depiction of the character, as in our example above of a Hamlet actor advocating for a salary increase while in character.