Andrej Karpathy
๐ค SpeakerAppearances Over Time
Podcast Appearances
And this neural net didn't have too much to do in the scope of the software.
It was making tiny predictions on individual little images.
And then the rest of the system stitched it up.
So, okay, we don't have just a single camera, we have eight cameras.
We actually have eight cameras over time.
And so what do you do with these predictions?
How do you put them together?
How do you do the fusion of all that information?
And how do you act on it?
All of that was written by humans in C++.
And then we decided, okay, we don't actually want to do all of that fusion in C++ code because we're actually not good enough to write that algorithm.
We want the neural nets to write the algorithm.
And we want to port all of that software into the 2.0 stack.
And so then we actually had neural nets that now take all the eight camera images simultaneously and make predictions for all of that.
And actually, they don't make predictions in the space of images.
They now make predictions directly in 3D.
And actually, they don't in three dimensions around the car.
And now, actually, we don't manually fuse the predictions in 3D over time.
We don't trust ourselves to write that tracker.
So actually, we give the neural net the information over time.