Alex Atallah
๐ค SpeakerAppearances Over Time
Podcast Appearances
We are memory constrained, I think, first on server memory.
And I don't think we're CPU constrained quite yet.
That was the first problem we aimed to solve.
Given a large set of providers for a given model, how do we really perfect the router to send you to the model that will be up as quickly as possible and send you the provider that can best serve the parameters requested?
We've been doing pretty well there, but I mean, we can do even better and it will get better soon.
We do a lot of like internal benchmarking against the rest of the market, against going direct to providers.
And so we can kind of track progress and hill climb internally.
I think what becomes like a real constraint for us is like brand new models that don't have much capacity because only one provider is serving them.
Sometimes we host them all ourselves or we work with a provider to do it.
And sometimes we just...
tell other providers like all the market signals that we're seeing and be like, guys, you should host this model.
Like, look, you know, there's in this region of the world, there's like this going on.
And we, you know, we blast this out.
And that sometimes solves the problem.