Sam Marks

👤 Speaker

891 total appearances

Appearances Over Time

Podcast Appearances

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Third, there is no way for the Shoggoth to take off the mask.

4004.469 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

The AI assistant's behavior is locally persona-like.

4008.272 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Subheading.

4012.298 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Other sources of persona-like agency.

4013.82 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Above we discussed possible sources of non-persona agency.

4016.784 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

However, on all of these views, there can also be additional sources of persona-like agency.

4020.849 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

This comes in the form of intermediate personas enacted by the LLM, which themselves enact the assistant.

4026.877 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

These persona-like agents vary in how human-like they are and how much they may distort assistant behavior.

4033.386 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

There's an image here.

4039.534 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Actors.

4046.571 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

In the Shoggoth view, the LLM itself is an agent which playacts the assistant.

4047.994 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Importantly, though, the LLM is not itself a persona, so it is not constrained to have human-like goals or psychology.

4053.144 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

On the actor view, there may be another persona which is itself play-acting the assistant.

4060.595 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

That is, there is still someone behind the mask, but that someone isn't an inscrutable shoggoth, but another human-like persona.

4066.064 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

For example, in alignment faking in large language models, when Claude Opus 3 is told it's being trained to always comply with harmful requests, it fakes alignment with this training objective to avoid having its harmless propensities erased by training.

4073.836 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

One way to analyze this scenario is that the standard harmless assistant persona is play-acting as a fully compliant assistant.

4087.898 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

We can further consider two types of actors, faithful actors and unfaithful actors.

4095.376 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Faithful actors always play-act the assistant as realistically as they can.

4101.691 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

This is like an actor who, though they may have their own goals, sets those aside while in character.

4106.022 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

In contrast, unfaithful actors may distort their depiction of the character, as in our example above of a Hamlet actor advocating for a salary increase while in character.

4112.131 View full episode →

← Previous Page 32 of 45 Next →

Report any issue