Boaz Barak
๐ค SpeakerAppearances Over Time
Podcast Appearances
Graph showing accuracy versus training compute for judge accuracy and confession rates when not complied.
Stacked area chart titled Confession No Bad Behavior throughout training showing training compute fractions.
However, there is significant variation between different evals.
A stacked bar chart showing fractions across confession and bad behavior categories.
Line graph titled AIME, intervention monitorability, showing three interventions across policy reasoning effort levels.
Graph showing flaky tools, outcome property monitorability, with three lines tracking G-means squared across policies reasoning effort.