Arvind Narayanan
👤 PersonAppearances Over Time
Podcast Appearances
And so the total amount that companies are spending on inference is actually going to increase. In an application like a chatbot, let's say, you know, it's text in, text out, no big deal. I think costs are going to come down. Even if someone is chatting with a chatbot all day, it's probably not going to get too expensive.
On the other hand, if you want to scan all of someone's emails, for instance, right? If a model gets cheaper, you know, you're just going to have it running always on in the background. And then from emails, you're going to get to all their documents, right? And some of those attachments might be many megabytes long.
On the other hand, if you want to scan all of someone's emails, for instance, right? If a model gets cheaper, you know, you're just going to have it running always on in the background. And then from emails, you're going to get to all their documents, right? And some of those attachments might be many megabytes long.
On the other hand, if you want to scan all of someone's emails, for instance, right? If a model gets cheaper, you know, you're just going to have it running always on in the background. And then from emails, you're going to get to all their documents, right? And some of those attachments might be many megabytes long.
And so there, even with Moore's law, I think cost is going to be significant in the medium term. And then you get to applications like writing code, where what we're seeing is that it's actually very beneficial to let the model do the same task tens of times, thousands of times, sometimes literally millions of times and pick the best answer.
And so there, even with Moore's law, I think cost is going to be significant in the medium term. And then you get to applications like writing code, where what we're seeing is that it's actually very beneficial to let the model do the same task tens of times, thousands of times, sometimes literally millions of times and pick the best answer.
And so there, even with Moore's law, I think cost is going to be significant in the medium term. And then you get to applications like writing code, where what we're seeing is that it's actually very beneficial to let the model do the same task tens of times, thousands of times, sometimes literally millions of times and pick the best answer.
So in those cases, it doesn't matter how much cost goes down. You're going to just proportionally increase the number of retries so that you can get a better quality of output.
So in those cases, it doesn't matter how much cost goes down. You're going to just proportionally increase the number of retries so that you can get a better quality of output.
So in those cases, it doesn't matter how much cost goes down. You're going to just proportionally increase the number of retries so that you can get a better quality of output.
So there is training compute, which is when the developer is building the model. And then there is inference compute, when the model is being deployed and the user is using it to do something. And it might seem like really the training cost is the one we should worry about, since it's trained on all of the text on the internet or whatever.
So there is training compute, which is when the developer is building the model. And then there is inference compute, when the model is being deployed and the user is using it to do something. And it might seem like really the training cost is the one we should worry about, since it's trained on all of the text on the internet or whatever.
So there is training compute, which is when the developer is building the model. And then there is inference compute, when the model is being deployed and the user is using it to do something. And it might seem like really the training cost is the one we should worry about, since it's trained on all of the text on the internet or whatever.
But it turns out that over the lifetime of a model, when you have billions of people using it, the inference cost actually adds up. And for many of the popular models, that's the cost that dominates. So let's talk about each of those two costs.
But it turns out that over the lifetime of a model, when you have billions of people using it, the inference cost actually adds up. And for many of the popular models, that's the cost that dominates. So let's talk about each of those two costs.
But it turns out that over the lifetime of a model, when you have billions of people using it, the inference cost actually adds up. And for many of the popular models, that's the cost that dominates. So let's talk about each of those two costs.
With respect to training costs, if you want to build a smaller model at the same level of capability or without compromising capability too much, you have to actually train it for longer. So that increases training costs. But that's maybe okay because you have a smaller model. You can push it to the consumer device or even if it's running on the cloud, your server costs are lower.
With respect to training costs, if you want to build a smaller model at the same level of capability or without compromising capability too much, you have to actually train it for longer. So that increases training costs. But that's maybe okay because you have a smaller model. You can push it to the consumer device or even if it's running on the cloud, your server costs are lower.
With respect to training costs, if you want to build a smaller model at the same level of capability or without compromising capability too much, you have to actually train it for longer. So that increases training costs. But that's maybe okay because you have a smaller model. You can push it to the consumer device or even if it's running on the cloud, your server costs are lower.
So your training cost increases, your inference cost decreases. But because it's the inference cost that dominates, the total cost is probably going to come down. So total cost comes down. If you have the same workload and you have a smaller model doing it, then the total cost is going to come down.