Kwasi Ankomah
👤 PersonAppearances Over Time
Podcast Appearances
that is just not acceptable, right?
Most production applications don't have a latency budget of 20 to 30 seconds, especially if there's a user interacting with it.
So I would say that the user experience has become a key factor.
As these things go into production, how do we actually make sure that the user is having a good time and that we can scale it, right?
Like what happens then when we, instead of having 200 users, we have 2000 users, right?
Because again,
you're running inference and you need to scale that inference to the amount of users.
So that becomes a problem because usually the more users you have, the more the hardware is under pressure in order to get the inference out and the slower the model goes again.
So I would say that that's probably the biggest one.
And the second is kind of the reality of where is the cost going?
I always thought, I came from a financial background and
When AI first came, everyone was like, it's a bit like the clouds, you know, this is great.
You know, no one was checking their bills, you know, and now you see what inference is costing you.
And suddenly you're like, well, hang on, like this inference is becoming most of our cost.
So actually.
And, you know, milliseconds of difference actually can mean millions in operational cost.
Like when you scale it up to kind of, you know, 20 million users, 30 million users, these getting out tokens faster is just going to cost you less money.
So, you know, one of the things here is that inference itself is becoming an expensive thing.
So those would be kind of the two things I think from a business perspective and from a thing that people can relate to is that you're not going to have a good experience if you have slow AI and it's going to start costing you money.
The last thing I'd also say is that there's now certain applications and I'll use voice because I think it's the real, you know, as...