Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Manas Joglekar

๐Ÿ‘ค Speaker
168 total appearances

Appearances Over Time

Podcast Appearances

LessWrong (Curated & Popular)
"Why we are excited about confession!" by boazbarak, Gabriel Wu, Manas Joglekar

Confessions, by their nature, are retrospective and serve to report on misalignment rather than preventing it in the first place.

LessWrong (Curated & Popular)
"Why we are excited about confession!" by boazbarak, Gabriel Wu, Manas Joglekar

We are also excited to explore more high-compute interventions aimed at improving alignment in the model's main outputs.

LessWrong (Curated & Popular)
"Why we are excited about confession!" by boazbarak, Gabriel Wu, Manas Joglekar

Heading Acknowledgements Thank you to Bowen Baker for his thoughtful comments.

LessWrong (Curated & Popular)
"Why we are excited about confession!" by boazbarak, Gabriel Wu, Manas Joglekar

Additionally, thank you to Cameron Raymond, Marcus Williams, Rose Wang,

LessWrong (Curated & Popular)
"Why we are excited about confession!" by boazbarak, Gabriel Wu, Manas Joglekar

Adam Collet, Kai Chen, Erin Cavanaugh, Ishan Singhal, and Apollo Research for designing some of the evaluations for reward hacking, hallucinations, instruction hierarchy, instruction following, and scheming we used in our work.

LessWrong (Curated & Popular)
"Why we are excited about confession!" by boazbarak, Gabriel Wu, Manas Joglekar

This article was narrated by Type 3 Audio for LessWrong.

LessWrong (Curated & Popular)
"Why we are excited about confession!" by boazbarak, Gabriel Wu, Manas Joglekar

It was published on January 14, 2026.

LessWrong (Curated & Popular)
"Why we are excited about confession!" by boazbarak, Gabriel Wu, Manas Joglekar

Images are included in the podcast episode description.

โ† Previous Page 9 of 9 Next โ†’