Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Sam Marks

๐Ÿ‘ค Speaker
891 total appearances

Appearances Over Time

Podcast Appearances

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Here we discuss cases where AI assistants behave in non-human-like ways.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

While these cases are, on their face, in tension with PSM, we overall think they have compelling PSM-compatible explanations.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Nevertheless, we think these case studies are useful for demonstrating what can and cannot be inferred from PSM.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Roughly speaking, we hypothesize that behaviors we discuss are caused by LLMs having limited capabilities or buggy behavior which distorts their rendition of the assistant.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

That is, the LLM is trying to simulate the assistant, but its execution is limited by capabilities.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

LLMs sometimes make mistakes that are not very human-like, for example stating that 9.11 greater than 9.9

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Despite generally having advanced mathematical capabilities, producing bizarre responses to altered versions of well-known riddles, see for example the altered riddles dataset for examples, or failing at simple character counting tasks like counting the RS in Strawberry.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

These UN human-like behaviors might appear to contradict PSM, which generally expects AI assistants to display human-like behavior.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

However, we hypothesize that these examples are better understood as arising from the limited capabilities of the underlying LLM.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Suppose that we observe a character in a story state that water boils at 50 degrees Celsius.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

This is false, since water boils at 100 degrees Celsius.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

We could understand this mistake in various ways.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

The story's author understood the fact was erroneous and intended for the character to make a mistake.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

The author did not intend for the character to err but was unable to write the character better.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

For example, perhaps the author themselves thought that water boils at 50 degrees Celsius.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

The text in the character's dialogue was playing some role other than being the author's best attempt at simulating how the character would behave.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

For example, perhaps the author is trying to send encoded messages to readers using digits that appear in the book's text.