Kwasi Ankomah
👤 PersonAppearances Over Time
Podcast Appearances
They swapped that out for like a Lama 8B.
You don't need 600 billion parameters.
You don't need it.
And that does two things, right, Corey?
Firstly, it speeds up your application because we can run that at a ludicrous tokens per second.
And probably the most important, the cost of inference reduces dramatically, right?
Because you are no longer paying top dollar for that premium model.
you're actually swapping it for and the difference in cost is it's essential you're talking you know 40 50 times cheaper so if you can like i was telling you if you take that slice of cost that was costing you a million and you just swap that out it really reduces it so i think that's that's kind of the yeah the second where we say that you know the the inference matters but just the fact that we can swap these models in and out and give you that choice on your hardware starts to really have that power and efficiency story that i like to talk about
Yeah.
Yeah.
It's so cool.
It's a fair question.
You've got two ways of doing this.
If you are working with us on a cloud basis, like anything, you can just pick the model.
But if you actually have this hardware yourself, you can load up what we call essentially a bundle.
So you can choose what you need for your use case.
So then it's there.
Then
At that point, Ryan, you just pick it like you would any other model.
Everything to the end user, once you've done that, it's just an open AI.