John Schulman
๐ค SpeakerAppearances Over Time
Podcast Appearances
So I would say for, uh, something like a, like a human evaluation, like what do humans prefer?
Uh, we've definitely made a lot of progress on both sides, uh, like pre-training and post-training and improving that.
Yeah, I would say there's a decent amount of room for variation in exactly how you do the training process.
And I think we have a lot of, I'd say we're actively trying to improve this and make the writing more lively and more fun.
And I think we've made some progress, like improving the personality of ChatGBT.
So it is more fun and it's better when you're trying to chit-chat with it and so forth.
It's less robotic.
I would say,
Yes, it's a kind of interesting question how some of the ticks came about, like the word delve.
I've actually caught myself using the word a bit recently.
So I don't know if it rubbed off on me from the model or what, but yeah.
Actually, I think there might be some funny effects going on where there's unintentional distillation happening between the language model providers where if you hire someone to go do a labeling task, they might just be feeding it into a model.
They might just be pulling up their favorite chatbot and feeding it in and having the model do the task and then copying and pasting it back.
So there might be...
that might account for some of the convergence, but also I think some of the things we're seeing are just what people like.
I mean, I think people do like bullet points.
They like the structured responses.
People do often like the big info dumps that they get from the models.
It's not completely clear how much is just a quirk of the particular choices and design of the post-training processes, and how much is actually intrinsic to
what people actually want.