Andrej Karpathy

So one example, they keep trying to, they keep misunderstanding the code because they have too much memory from all the typical ways of doing things on the internet that I just wasn't adopting.

1889.428 View full episode →

Dwarkesh Podcast

Andrej Karpathy — AGI is still a decade away

So the models, for example, I mean, I don't know if I want to get into the full details, but they keep thinking I'm writing normal code and I'm not.

1900.062 View full episode →

Dwarkesh Podcast

Andrej Karpathy — AGI is still a decade away

Maybe one example.

1908.323 View full episode →

Dwarkesh Podcast

Andrej Karpathy — AGI is still a decade away

Maybe one example is, so the way to synchronize, so we have eight GPUs that are all doing forward backwards.

1909.364 View full episode →

Dwarkesh Podcast

Andrej Karpathy — AGI is still a decade away

The way to synchronize gradients between them is to use a distributed data parallel container of PyTorch, which automatically does all the, as you're doing the backward, it will start communicating and synchronizing gradients.

1914.89 View full episode →

Dwarkesh Podcast

Andrej Karpathy — AGI is still a decade away

I didn't use DDP because I didn't want to use it because it's not necessary.

1924.08 View full episode →

Dwarkesh Podcast

Andrej Karpathy — AGI is still a decade away

So I threw it out.

1928.164 View full episode →

Dwarkesh Podcast

Andrej Karpathy — AGI is still a decade away

And I basically wrote my own synchronization routine that's inside the step of the optimizer.

1929.285 View full episode →

Dwarkesh Podcast

Andrej Karpathy — AGI is still a decade away

And so the models were trying to get me to use the DDP container, and they were very concerned about, okay, this gets way too technical, but I wasn't using that container because I don't need it, and I have a custom implementation of something like it.

1933.67 View full episode →

Dwarkesh Podcast

Andrej Karpathy — AGI is still a decade away

Yeah, they couldn't get past that.

1947.216 View full episode →

Dwarkesh Podcast

Andrej Karpathy — AGI is still a decade away

and then um they kept trying to like mess up the style like they're way too over defensive they make all these try catch statements they keep trying to make a production code base and i have a bunch of assumptions in my code and it's okay and uh and it's just like i don't need all this extra stuff in there and so i just kind of feel like they're bloating the code base they're bloating the complexity they keep misunderstanding they're using deprecated apis a bunch of times so it's total mess um

1949.38 View full episode →

Dwarkesh Podcast

Andrej Karpathy — AGI is still a decade away

and it's just not that useful.

1974.828 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment