Aman Sanger
๐ค PersonAppearances Over Time
Podcast Appearances
uh suspending a lot of disbelief and assuming like you had the know-how um and operate or or if you're saying like you have to operate with like the limited information you have now no no actually i would say you swoop in and you get all the information all the little heuristics all the little parameters all the all the parameters that define how the thing is trained and
uh suspending a lot of disbelief and assuming like you had the know-how um and operate or or if you're saying like you have to operate with like the limited information you have now no no actually i would say you swoop in and you get all the information all the little heuristics all the little parameters all the all the parameters that define how the thing is trained and
uh suspending a lot of disbelief and assuming like you had the know-how um and operate or or if you're saying like you have to operate with like the limited information you have now no no actually i would say you swoop in and you get all the information all the little heuristics all the little parameters all the all the parameters that define how the thing is trained and
Well, this gets into the question of like, are you really limited by compute and money or are you limited by these other things?
Well, this gets into the question of like, are you really limited by compute and money or are you limited by these other things?
Well, this gets into the question of like, are you really limited by compute and money or are you limited by these other things?
I think, yeah, because even with all this compute and like, you know, all the data you could collect in the world, I think you really are ultimately limited by not even ideas, but just like really good engineering. Like, even with all the capital in the world, would you really be able to assemble... Like, there aren't that many people in the world who really can, like, make the difference here.
I think, yeah, because even with all this compute and like, you know, all the data you could collect in the world, I think you really are ultimately limited by not even ideas, but just like really good engineering. Like, even with all the capital in the world, would you really be able to assemble... Like, there aren't that many people in the world who really can, like, make the difference here.
I think, yeah, because even with all this compute and like, you know, all the data you could collect in the world, I think you really are ultimately limited by not even ideas, but just like really good engineering. Like, even with all the capital in the world, would you really be able to assemble... Like, there aren't that many people in the world who really can, like, make the difference here.
And there's so much work that goes into research that is just, like, pure, really, really hard engineering work. As, like, a very...
And there's so much work that goes into research that is just, like, pure, really, really hard engineering work. As, like, a very...
And there's so much work that goes into research that is just, like, pure, really, really hard engineering work. As, like, a very...
kind of hand-wavy example, if you look at the original Transformer paper, you know, how much work was kind of joining together a lot of these really interesting concepts embedded in the literature versus then going in and writing all the codes, like maybe the CUDA kernels, maybe whatever else, I don't know if it ran on GPUs or TPUs originally, such that it actually saturated the GPU performance, right?
kind of hand-wavy example, if you look at the original Transformer paper, you know, how much work was kind of joining together a lot of these really interesting concepts embedded in the literature versus then going in and writing all the codes, like maybe the CUDA kernels, maybe whatever else, I don't know if it ran on GPUs or TPUs originally, such that it actually saturated the GPU performance, right?
kind of hand-wavy example, if you look at the original Transformer paper, you know, how much work was kind of joining together a lot of these really interesting concepts embedded in the literature versus then going in and writing all the codes, like maybe the CUDA kernels, maybe whatever else, I don't know if it ran on GPUs or TPUs originally, such that it actually saturated the GPU performance, right?
Getting GNOME to go in and do all this code, right? And GNOME is like probably one of the best engineers in the world. Or maybe going a step further, like the next generation of models, having these things, like getting model parallelism to work and scaling it on like, you know, thousands of or maybe tens of thousands of like V100s, which I think GBDE3 may have been.
Getting GNOME to go in and do all this code, right? And GNOME is like probably one of the best engineers in the world. Or maybe going a step further, like the next generation of models, having these things, like getting model parallelism to work and scaling it on like, you know, thousands of or maybe tens of thousands of like V100s, which I think GBDE3 may have been.
Getting GNOME to go in and do all this code, right? And GNOME is like probably one of the best engineers in the world. Or maybe going a step further, like the next generation of models, having these things, like getting model parallelism to work and scaling it on like, you know, thousands of or maybe tens of thousands of like V100s, which I think GBDE3 may have been.
There's just so much engineering effort that has to go into all of these things to make it work. If you really brought that cost down to... like, you know, maybe not zero, but just made it 10X easier, made it super easy for someone with really fantastic ideas to immediately get to the version of like the new architecture they dreamed up that is like getting 50, 40% utilization on the GPUs.
There's just so much engineering effort that has to go into all of these things to make it work. If you really brought that cost down to... like, you know, maybe not zero, but just made it 10X easier, made it super easy for someone with really fantastic ideas to immediately get to the version of like the new architecture they dreamed up that is like getting 50, 40% utilization on the GPUs.