Jacob Drori
π€ SpeakerAppearances Over Time
Podcast Appearances
There's an image here.
I think it is morally correct to freeze LΓ©on during pruning so that the model cannot cheat in the way described above.
but it seems doing so does not fully fix the faithfulness issues, see the IOI, complex formula omitted from the narration.
Results directly above.
A final caveat to the results in this appendix.
For each model and task, I performed a carb sweep to find the best hyperparameters for pruning and then used these best hyperparameters for each of the 100 random laceded pruning runs.
It may be the case that for example for the, complex formula omitted from the narration,
We happened to find unlucky hyperparameters that lead to poor generalization to same gender prompts, whereas we got lucky with the hyperparameters we found for the complex formula omitted from the narration.
Model.
In other words, the 100 seeds are perhaps not as decorrelated as we'd like.
This article was narrated by Type 3 Audio for Less Wrong.
It was published on February 9, 2026.
The original text contained six footnotes which were omitted from the narration.
Images are included in the podcast episode description.