Andrej Karpathy
π€ SpeakerAppearances Over Time
Podcast Appearances
How do you put them together?
How do you do the fusion of all that information?
And how do you act on it?
All of that was written by humans in C++.
And then we decided, okay, we don't actually want to do all of that fusion in C++ code because we're actually not good enough to write that algorithm.
We want the neural nets to write the algorithm.
And we want to port all of that software into the 2.0 stack.
And so then we actually had neural nets that now take all the eight camera images simultaneously and make predictions for all of that.
And actually, they don't make predictions in the space of images.
They now make predictions directly in 3D.
And actually, they don't in three dimensions around the car.
And now, actually, we don't manually fuse the predictions in 3D over time.
We don't trust ourselves to write that tracker.
So actually, we give the neural net the information over time.
So it takes these videos now and makes those predictions.
And so you're sort of just like putting more and more power into the neural network processing.
And at the end of it, the eventual sort of goal is to have most of the software potentially be in the 2.0 land because it works significantly better.
Humans are just not very good at writing software, basically.
I would say by far in the industry, if you're talking about the industry and what is the technology of what we have available, everything is supervised learning.
So you need a data set of input, desired output, and you need lots of it.