George Hotz
👤 PersonAppearances Over Time
Podcast Appearances
By the way, I really like PyTorch. I think that it's actually a very good piece of software. I think that they've made a few different trade-offs, and these different trade-offs are where TinyGrad takes a different path. One of the biggest differences is it's really easy to see the kernels that are actually being sent to the GPU.
By the way, I really like PyTorch. I think that it's actually a very good piece of software. I think that they've made a few different trade-offs, and these different trade-offs are where TinyGrad takes a different path. One of the biggest differences is it's really easy to see the kernels that are actually being sent to the GPU.
By the way, I really like PyTorch. I think that it's actually a very good piece of software. I think that they've made a few different trade-offs, and these different trade-offs are where TinyGrad takes a different path. One of the biggest differences is it's really easy to see the kernels that are actually being sent to the GPU.
If you run PyTorch on the GPU, you like do some operation and you don't know what kernels ran. You don't know how many kernels ran. You don't know how many flops were used. You don't know how much memory accesses were used. TinyGrad type debug equals two. And it will show you in this beautiful style, every kernel that's run, how many flops and how many bytes.
If you run PyTorch on the GPU, you like do some operation and you don't know what kernels ran. You don't know how many kernels ran. You don't know how many flops were used. You don't know how much memory accesses were used. TinyGrad type debug equals two. And it will show you in this beautiful style, every kernel that's run, how many flops and how many bytes.
If you run PyTorch on the GPU, you like do some operation and you don't know what kernels ran. You don't know how many kernels ran. You don't know how many flops were used. You don't know how much memory accesses were used. TinyGrad type debug equals two. And it will show you in this beautiful style, every kernel that's run, how many flops and how many bytes.
TinyGrad solves the problem of porting new ML accelerators quickly. One of the reasons, tons of these companies now, I think Sequoia marked GraphCore to zero, right? Cerebus, TensTorrent, Grok. All of these ML accelerator companies, they built chips. The chips were good. The software was terrible. And part of the reason is because I think the same problem is happening with Dojo.
TinyGrad solves the problem of porting new ML accelerators quickly. One of the reasons, tons of these companies now, I think Sequoia marked GraphCore to zero, right? Cerebus, TensTorrent, Grok. All of these ML accelerator companies, they built chips. The chips were good. The software was terrible. And part of the reason is because I think the same problem is happening with Dojo.
TinyGrad solves the problem of porting new ML accelerators quickly. One of the reasons, tons of these companies now, I think Sequoia marked GraphCore to zero, right? Cerebus, TensTorrent, Grok. All of these ML accelerator companies, they built chips. The chips were good. The software was terrible. And part of the reason is because I think the same problem is happening with Dojo.
It's really, really hard to write a PyTorch port because you have to write 250 kernels and you have to tune them all for performance.
It's really, really hard to write a PyTorch port because you have to write 250 kernels and you have to tune them all for performance.
It's really, really hard to write a PyTorch port because you have to write 250 kernels and you have to tune them all for performance.
Look, my prediction for Ten's Torrent is that they're going to pivot to making RISC-V chips. CPUs. CPUs.
Look, my prediction for Ten's Torrent is that they're going to pivot to making RISC-V chips. CPUs. CPUs.
Look, my prediction for Ten's Torrent is that they're going to pivot to making RISC-V chips. CPUs. CPUs.
Because AI accelerators are a software problem, not really a hardware problem.
Because AI accelerators are a software problem, not really a hardware problem.
Because AI accelerators are a software problem, not really a hardware problem.
I think what's going to happen is if I can finish... Okay. If you're trying to make an AI accelerator... You better have the capability of writing a torch-level performance stack on NVIDIA GPUs.
I think what's going to happen is if I can finish... Okay. If you're trying to make an AI accelerator... You better have the capability of writing a torch-level performance stack on NVIDIA GPUs.