Andrej Karpathy
๐ค SpeakerAppearances Over Time
Podcast Appearances
because there's lots of examples of it in the training sets of these models.
So there's features of things where the models will do very well.
I would say NanoChat is not an example of this, because it's a fairly unique repository.
There's not that much code, I think, in the way that I've structured it.
And it's not boilerplate code.
It's like actually like intellectually intense code almost.
And everything has to be very precisely arranged.
And the models are always trying to, they kept trying to, I mean, they have so many cognitive deficits, right?
So one example, they keep trying to, they keep misunderstanding the code because they have too much memory from all the typical ways of doing things on the internet that I just wasn't adopting.
So the models, for example, I mean, I don't know if I want to get into the full details, but they keep thinking I'm writing normal code and I'm not.
Maybe one example.
Maybe one example is, so the way to synchronize, so we have eight GPUs that are all doing forward backwards.
The way to synchronize gradients between them is to use a distributed data parallel container of PyTorch, which automatically does all the, as you're doing the backward, it will start communicating and synchronizing gradients.
I didn't use DDP because I didn't want to use it because it's not necessary.
So I threw it out.
And I basically wrote my own synchronization routine that's inside the step of the optimizer.
And so the models were trying to get me to use the DDP container, and they were very concerned about, okay, this gets way too technical, but I wasn't using that container because I don't need it, and I have a custom implementation of something like it.
Yeah, they couldn't get past that.
and then um they kept trying to like mess up the style like they're way too over defensive they make all these try catch statements they keep trying to make a production code base and i have a bunch of assumptions in my code and it's okay and uh and it's just like i don't need all this extra stuff in there and so i just kind of feel like they're bloating the code base they're bloating the complexity they keep misunderstanding they're using deprecated apis a bunch of times so it's total mess um
and it's just not that useful.