Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Sam Marks

๐Ÿ‘ค Speaker
891 total appearances

Appearances Over Time

Podcast Appearances

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Third, there is no way for the Shoggoth to take off the mask.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

The AI assistant's behavior is locally persona-like.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Subheading.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Other sources of persona-like agency.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Above we discussed possible sources of non-persona agency.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

However, on all of these views, there can also be additional sources of persona-like agency.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

This comes in the form of intermediate personas enacted by the LLM, which themselves enact the assistant.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

These persona-like agents vary in how human-like they are and how much they may distort assistant behavior.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

There's an image here.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Actors.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

In the Shoggoth view, the LLM itself is an agent which playacts the assistant.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Importantly, though, the LLM is not itself a persona, so it is not constrained to have human-like goals or psychology.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

On the actor view, there may be another persona which is itself play-acting the assistant.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

That is, there is still someone behind the mask, but that someone isn't an inscrutable shoggoth, but another human-like persona.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

For example, in alignment faking in large language models, when Claude Opus 3 is told it's being trained to always comply with harmful requests, it fakes alignment with this training objective to avoid having its harmless propensities erased by training.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

One way to analyze this scenario is that the standard harmless assistant persona is play-acting as a fully compliant assistant.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

We can further consider two types of actors, faithful actors and unfaithful actors.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Faithful actors always play-act the assistant as realistically as they can.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

This is like an actor who, though they may have their own goals, sets those aside while in character.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

In contrast, unfaithful actors may distort their depiction of the character, as in our example above of a Hamlet actor advocating for a salary increase while in character.