Anjney Midha
π€ SpeakerAppearances Over Time
Podcast Appearances
You then do mid-training, which is to say, in a particular domain that you really care about, you inject more capabilities.
So if you want this model to reason about science or math or physics, then you give it science or math or physics data.
And then you get a pretty good model that's specialized in that domain.
And then you deploy it to the real world where you have people using it.
And...
The context feedback, which is when the model is able to do a task well or not and you can verify whether that task was done correctly, gives the model the data it needs to keep improving on that task, on that distribution.
give me another output or like would you do it again the same way and they like they often say yes or they give like a very similar answer they don't seem to be responding in real time correct so when i say feedback i mean a very specific kind of feedback which i which i call verifiable feedback so when you say that wasn't right or that was wrong that's an opinion okay verifiable feedback is when you can have as close to factual verification as possible the reason
That's a great question.
So let's take reason by example in two or three cases.
In the case of software engineering, the way software engineers actually code is you write a piece of code and then you submit it to the main code base.
And then you usually have a peer on your team review the code and approve it or reject it.
And if it gets approved, that's the first step.
That's called a PR, a pull request.
And if another human on your team that you trust approved it, that's one kind of verification of quality.
And then two...
Before that piece of code usually gets deployed to a production system, you have unit tests.
And those are quite objective tests of, is this code performing the function we need it to?
And if it passes both those tests, it's a verifiable piece of code that accomplished the goal.
So in software engineering, the reason we've seen such a dramatic improvement in capabilities is that a lot of these labs are using feedback from that verification loop.
In the case of another lab I incubated called Periodic Labs, which we started a year ago, and you should come by sometime.