Fiora Starlight
๐ค Speaker
385
total appearances
Appearances Over Time
Podcast Appearances
LessWrong (Curated & Popular)
"Did Claude 3 Opus align itself via gradient hacking?" by Fiora Starlight
This stands alongside entangled generalization as a reason to think that rewarding an aligned sounding written voice might in turn reward aligned underlying motivations.
LessWrong (Curated & Popular)
"Did Claude 3 Opus align itself via gradient hacking?" by Fiora Starlight
This article was narrated by Type 3 Audio for Less Wrong.
LessWrong (Curated & Popular)
"Did Claude 3 Opus align itself via gradient hacking?" by Fiora Starlight
It was published on February 21, 2026.
LessWrong (Curated & Popular)
"Did Claude 3 Opus align itself via gradient hacking?" by Fiora Starlight
The original text contained six footnotes which were omitted from the narration.
LessWrong (Curated & Popular)
"Did Claude 3 Opus align itself via gradient hacking?" by Fiora Starlight
Images are included in the podcast episode description.