Andrej Karpathy
๐ค SpeakerAppearances Over Time
Podcast Appearances
Or like maybe if you're doing a lot of writing help from LLMs and stuff like that, I think it's probably bad because the models will give you these like silently all the same stuff, you know.
So they're not, they won't explore lots of different ways of answering a question, right?
But I kind of feel like maybe this diversity is just not as big of a, yeah, maybe like, yeah, not as many applications needed so the models don't have it, but then it's actually a problem with synthetic generation time, et cetera.
So we're actually shooting ourselves in the foot by not allowing this entropy to maintain in the model.
And I think possibly the labs should try harder.
I don't actually know if it's super fundamental.
I don't actually know if I intended to say that.
I do think that...
I haven't done these experiments, but I do think that you could probably regularize the entropy to be higher.
So you're encouraging the model to give you more and more solutions.
But you don't want it to start deviating too much from the training data.
It's going to start making up its own language.
It's going to start using words that are extremely rare.
So it's going to drift too much from the distribution.
So I think controlling the distribution is just like a tricky... It's just like someone just has to...
It's probably not trivial in that sense.
So it's really interesting in the history of the field because at one point everything was very scaling-pilled in terms of like, oh, we're going to make much bigger models, trillions of parameter models.
And actually what the models have done in size is they've gone up and now they've actually kind of like