Dylan Patel
๐ค SpeakerAppearances Over Time
Podcast Appearances
Because at the end of pre-training is when you increase the context length for these models. And we've talked earlier in the conversation about how the context length, when you have a long input, is much easier to manage than output. And a lot of these post-training and reasoning techniques rely on a ton of sampling, and it's becoming increasingly long context.
So it's just like, effectively, your compute efficiency goes down. I think flops is the standard for how you measure it. But with RL, and you have to do all these things where you... move your weights around in a different way than at pre-training and just generation. It's going to become less efficient, and flops is going to be less of a useful term.
So it's just like, effectively, your compute efficiency goes down. I think flops is the standard for how you measure it. But with RL, and you have to do all these things where you... move your weights around in a different way than at pre-training and just generation. It's going to become less efficient, and flops is going to be less of a useful term.
So it's just like, effectively, your compute efficiency goes down. I think flops is the standard for how you measure it. But with RL, and you have to do all these things where you... move your weights around in a different way than at pre-training and just generation. It's going to become less efficient, and flops is going to be less of a useful term.
And then as the infrastructure gets better, it's probably going to go back to flops.
And then as the infrastructure gets better, it's probably going to go back to flops.
And then as the infrastructure gets better, it's probably going to go back to flops.
You know, if it doesn't exist already.
You know, if it doesn't exist already.
You know, if it doesn't exist already.
Well, it's easier. It's harder to switch than it is to do it. There's big fees for switching, too.
Well, it's easier. It's harder to switch than it is to do it. There's big fees for switching, too.
Well, it's easier. It's harder to switch than it is to do it. There's big fees for switching, too.
Yeah, one day Amazon Prime will triple in price.
Yeah, one day Amazon Prime will triple in price.
Yeah, one day Amazon Prime will triple in price.
Yeah, one would think.
Yeah, one would think.
Yeah, one would think.
I mean, NVIDIA's entire culture is designed from the bottom up to do this. There's this recent book, The NVIDIA Way by Taekim, that details this and how they look for future opportunities and ready their CUDA software libraries to make it so that new applications of high-performance computing can very rapidly be evolved on CUDA and NVIDIA chips.