Dwarkesh Patel
๐ค SpeakerAppearances Over Time
Podcast Appearances
LabelBox helps you train models not just to get the right answer, but to think the right way.
They've operationalized these reasoning behaviors into rubrics, giving you the ability to evaluate every important dimension of a model's output.
These rubrics go beyond simple correctness.
Did the model reach for the right tools?
Did it check its own work and explore alternative paths?
How clear was its response?
These skills are useful across domains, math, physics, finance, psychology, and more.
And they're becoming increasingly important as models take on harder, open-ended problems, some of which have multiple solutions and some of which we don't even know the solutions to.
Labelbox can get you rubrics tailored to your domain, helping you systematically measure and shape how your models think.
Learn more at labelbox.com.
One big question I have is, how plausible is it that if we just keep training AIs to get better and better at solving problems in lean, that they will continue to solve more and more impressive problems, and then we will, in retrospect, be surprised at how little insight
be got from some lean solution to proving the rebound hypothesis or something?
Or do you think it is a necessary condition also in the rebound hypothesis, even by an AI that is like totally doing it in lean that the constructions which are made, the definitions which are created, even in the lean program,
have to advance our understanding of mathematics?
Or do you think it could just be assembly code gobble-de-gook?
the eye figures it out and latent in the lean is some brand new construction, which, you know, if you realize the significance would be,
we would be able to apply it in all these different situations.
How would we even recognize it, right?
Like if you just, again, a very naive question, but if you come up with the equivalent of like, Descartes comes up with this idea, oh, you can have this coordinate system where you can unify algebra and geometry, but in lean code, it would just look like R to R and it wouldn't look that significant or something.
Or similarly,