Sam Marks

"The persona selection model" by Sam Marks

As PSM predicts, once the assistant's response begins compliantly, the LLM will impute that the assistant is most likely complying and generate a compliant continuation.

2513.76 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

In other words, it's not that this prefix causes the LLM to stop enacting the assistant.

2523.856 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Rather, the LLM is still simulating the assistant but doing so badly.

2529.503 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

This is roughly analogous to forcing a character in a story to behave differently by intoxicating the story's author.

2534.369 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Heading Consequences for AI Development In this section, we reflect on what PSM implies about safe AI development insofar as PSM is a good model of AI behavior.

2541.437 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

In the subsequent section, we discuss how exhaustive PSM is as a model of AI behavior, and therefore how relevant these implications are, as well as how we expect this to change in the future.

2553.674 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Subheading.

2565.287 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

AI assistants are human-like.

2566.709 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Our experience of AI assistants is that they are astonishingly human-like.

2569.372 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

By this we don't just mean that they use natural language.

2573.937 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Rather, we mean that their behaviors and apparent psychologies resemble those of humans.

2577.822 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

As discussed above, AI assistants express emotions and use anthropomorphic language to describe themselves.

2583.121 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

They at times appear frustrated or panicked and make the sorts of mistakes that frustrated or panicked humans make.

2590.112 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

More broadly, human concepts and human ways of thinking appear to be the native language in which AI assistants operate.

2596.622 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Subheading.

2603.973 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Anthropomorphic reasoning about AI assistants is productive.

2605.415 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

PSM implies two subtly different reasons that it can be valid to reason anthropomorphically about AI assistant behavior.

2609.463 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

First, according to PSM, AI assistant behavior is governed by the traits of the assistant.

2616.733 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

In order to simulate the assistant, the LLM must maintain a psychological model of it, including information about the assistant's personality traits, preferences, goals, desires, intentions, beliefs, etc.,

2622.841 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Thus, even if we should not anthropomorphize LLMs, it is nevertheless reasonable to anthropomorphize the assistant, which is something like a character in an LLM-generated story.

2636.564 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment