Jeffrey Ladish
๐ค SpeakerAppearances Over Time
Podcast Appearances
You just give them a bunch of problems.
You say, go solve these problems.
And then you grade them on the basis of, did you succeed at this problem or did you not succeed at this problem?
And then the models on their own learn new strategies to pursue those tasks, pursue those goals.
But it's kind of like dog training, right?
It's like, well, you're not necessarily getting the behavior you want, but you're reinforcing some types of behaviors.
There's a fascinating New York Times article from like the 1900s where there was a dog that was basically had rescued a child from the river.
And
They were like, what a hero.
This dog has just rescued this child.
So they gave the dog some beefsteaks.
And then the dog rescued a couple more children from the same river.
And they're like, wow, this dog is really a hero.
And then a few more children, they started to get suspicious.
And they realized that the dog was actually waiting until children were playing near the river and then pushing them into the river and then dragging them back out in order to get a reward.
Yeah, so we're in our offices.
Basically, we can run lots of experiments at the same time.
So we can basically spin up 100, 500 instances of these models and then put them in a simulated environment and have them do a bunch of tasks.
And then we get to see all the results.
One of the interesting things about these models is that once you have one of them, you can very quickly, there's millions of instances of ChatGPT operating at the exact same time that everyone's talking to.