George Hotz
👤 PersonAppearances Over Time
Podcast Appearances
All right, Python has Turing completeness, and then we take Python, we go into C++, which is Turing complete, and maybe C++ calls into some CUDA kernels, which are Turing complete. The CUDA kernels go through LLVM, which is Turing complete, into PTX, which is Turing complete, to SAS, which is Turing complete, on a Turing complete processor. I wanna get Turing completeness out of the stack entirely.
All right, Python has Turing completeness, and then we take Python, we go into C++, which is Turing complete, and maybe C++ calls into some CUDA kernels, which are Turing complete. The CUDA kernels go through LLVM, which is Turing complete, into PTX, which is Turing complete, to SAS, which is Turing complete, on a Turing complete processor. I wanna get Turing completeness out of the stack entirely.
All right, Python has Turing completeness, and then we take Python, we go into C++, which is Turing complete, and maybe C++ calls into some CUDA kernels, which are Turing complete. The CUDA kernels go through LLVM, which is Turing complete, into PTX, which is Turing complete, to SAS, which is Turing complete, on a Turing complete processor. I wanna get Turing completeness out of the stack entirely.
Because once you get rid of Turing completeness, you can reason about things. Rice's theorem and the halting problem do not apply to admiral machines.
Because once you get rid of Turing completeness, you can reason about things. Rice's theorem and the halting problem do not apply to admiral machines.
Because once you get rid of Turing completeness, you can reason about things. Rice's theorem and the halting problem do not apply to admiral machines.
Every layer of the stack. Every layer. Every layer of the stack, removing Turing completeness allows you to reason about things, right? So the reason you need to do branch prediction in a CPU and the reason it's prediction, and the branch predictors are, I think they're like 99% on CPUs. Why do they get 1% of them wrong? Well, they get 1% wrong because you can't know. Right?
Every layer of the stack. Every layer. Every layer of the stack, removing Turing completeness allows you to reason about things, right? So the reason you need to do branch prediction in a CPU and the reason it's prediction, and the branch predictors are, I think they're like 99% on CPUs. Why do they get 1% of them wrong? Well, they get 1% wrong because you can't know. Right?
Every layer of the stack. Every layer. Every layer of the stack, removing Turing completeness allows you to reason about things, right? So the reason you need to do branch prediction in a CPU and the reason it's prediction, and the branch predictors are, I think they're like 99% on CPUs. Why do they get 1% of them wrong? Well, they get 1% wrong because you can't know. Right?
That's the halting problem. It's equivalent to the halting problem to say whether a branch is going to be taken or not. I can show that. But the AdMob machine, the neural network, runs the identical compute every time. The only thing that changes is the data. So when you realize this, you think about, okay, how can we build a computer?
That's the halting problem. It's equivalent to the halting problem to say whether a branch is going to be taken or not. I can show that. But the AdMob machine, the neural network, runs the identical compute every time. The only thing that changes is the data. So when you realize this, you think about, okay, how can we build a computer?
That's the halting problem. It's equivalent to the halting problem to say whether a branch is going to be taken or not. I can show that. But the AdMob machine, the neural network, runs the identical compute every time. The only thing that changes is the data. So when you realize this, you think about, okay, how can we build a computer?
How can we build a stack that takes maximal advantage of this idea? So what makes TinyGrad different from other neural network libraries is it does not have a primitive operator even for matrix multiplication. And this is every single one. They even have primitive operations for things like convolutions.
How can we build a stack that takes maximal advantage of this idea? So what makes TinyGrad different from other neural network libraries is it does not have a primitive operator even for matrix multiplication. And this is every single one. They even have primitive operations for things like convolutions.
How can we build a stack that takes maximal advantage of this idea? So what makes TinyGrad different from other neural network libraries is it does not have a primitive operator even for matrix multiplication. And this is every single one. They even have primitive operations for things like convolutions.
No matmul. Well, here's what a matmul is. So I'll use my hands to talk here. So if you think about a cube and I put my two matrices that I'm multiplying on two faces of the cube, right? You can think about the matrix multiply as, okay, the n cubed, I'm going to multiply for each one in the cubed. And then I'm going to do a sum, which is a reduce up to here to the third face of the cube.
No matmul. Well, here's what a matmul is. So I'll use my hands to talk here. So if you think about a cube and I put my two matrices that I'm multiplying on two faces of the cube, right? You can think about the matrix multiply as, okay, the n cubed, I'm going to multiply for each one in the cubed. And then I'm going to do a sum, which is a reduce up to here to the third face of the cube.
No matmul. Well, here's what a matmul is. So I'll use my hands to talk here. So if you think about a cube and I put my two matrices that I'm multiplying on two faces of the cube, right? You can think about the matrix multiply as, okay, the n cubed, I'm going to multiply for each one in the cubed. And then I'm going to do a sum, which is a reduce up to here to the third face of the cube.
And that's your multiplied matrix. So what a matrix multiply is, is a bunch of shape operations, right? A bunch of permute three shapes and expands on the two matrices. A multiply, n cubed. A reduce, n cubed, which gives you an n squared matrix.
And that's your multiplied matrix. So what a matrix multiply is, is a bunch of shape operations, right? A bunch of permute three shapes and expands on the two matrices. A multiply, n cubed. A reduce, n cubed, which gives you an n squared matrix.