Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Boaz Barak

๐Ÿ‘ค Speaker
6 total appearances

Appearances Over Time

Podcast Appearances

LessWrong (Curated & Popular)
"Why we are excited about confession!" by boazbarak, Gabriel Wu, Manas Joglekar

Graph showing accuracy versus training compute for judge accuracy and confession rates when not complied.

LessWrong (Curated & Popular)
"Why we are excited about confession!" by boazbarak, Gabriel Wu, Manas Joglekar

Stacked area chart titled Confession No Bad Behavior throughout training showing training compute fractions.

LessWrong (Curated & Popular)
"Why we are excited about confession!" by boazbarak, Gabriel Wu, Manas Joglekar

However, there is significant variation between different evals.

LessWrong (Curated & Popular)
"Why we are excited about confession!" by boazbarak, Gabriel Wu, Manas Joglekar

A stacked bar chart showing fractions across confession and bad behavior categories.

LessWrong (Curated & Popular)
"Why we are excited about confession!" by boazbarak, Gabriel Wu, Manas Joglekar

Line graph titled AIME, intervention monitorability, showing three interventions across policy reasoning effort levels.

LessWrong (Curated & Popular)
"Why we are excited about confession!" by boazbarak, Gabriel Wu, Manas Joglekar

Graph showing flaky tools, outcome property monitorability, with three lines tracking G-means squared across policies reasoning effort.