Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Andrej Karpathy

๐Ÿ‘ค Speaker
3433 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

And everything has to be very precisely arranged.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

And the models are always trying to, they kept trying to, I mean, they have so many cognitive deficits, right?

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

So one example, they keep trying to, they keep misunderstanding the code because they have too much memory from all the typical ways of doing things on the internet that I just wasn't adopting.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

So the models, for example, I mean, I don't know if I want to get into the full details, but they keep thinking I'm writing normal code and I'm not.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

Maybe one example.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

Maybe one example is, so the way to synchronize, so we have eight GPUs that are all doing forward backwards.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

The way to synchronize gradients between them is to use a distributed data parallel container of PyTorch, which automatically does all the, as you're doing the backward, it will start communicating and synchronizing gradients.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

I didn't use DDP because I didn't want to use it because it's not necessary.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

So I threw it out.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

And I basically wrote my own synchronization routine that's inside the step of the optimizer.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

And so the models were trying to get me to use the DDP container, and they were very concerned about, okay, this gets way too technical, but I wasn't using that container because I don't need it, and I have a custom implementation of something like it.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

Yeah, they couldn't get past that.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

and then um they kept trying to like mess up the style like they're way too over defensive they make all these try catch statements they keep trying to make a production code base and i have a bunch of assumptions in my code and it's okay and uh and it's just like i don't need all this extra stuff in there and so i just kind of feel like they're bloating the code base they're bloating the complexity they keep misunderstanding they're using deprecated apis a bunch of times so it's total mess um

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

and it's just not that useful.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

I can go in and I can clean it up, but it's not that useful.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

I also feel like it's kind of annoying to have to, like, type out what I want in English because it's just too much typing.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

Like, if I just navigate to the part of the code that I want and I go where I know the code has to appear and I start typing out the first three letters, autocomplete gets it and just gives you the code.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

And so I think it's... This is a very high-information bandwidth to specify what you want.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

If you point to the code where you want it and you type out the first few pieces, and the model will complete it.

Dwarkesh Podcast
Andrej Karpathy โ€” AGI is still a decade away

So I guess what I mean is...