Jyunmi
👤 PersonAppearances Over Time
Podcast Appearances
Right now, all the models are different enough in what they're the best at to where you kind of want to use them all
in the right use case or at least use them in conjunction with each other.
Like I just saw about an open source project that was tied them all up into sort of like a council of experts kind of situation where you could
you'd have your orchestration, you would send it off to, your query would be then determined who would, which model would this best fit for.
Everyone would answer and then they would, then they would deliberate on what the answer was and seek alignment through that process, right?
So sort of the mechanisms that's already happening within each model, but now amongst all of these two separate models.
But that kind of goes back to what I was talking about in terms of it really is what,
What kind of use case am I going to go for?
If Gemini is better at assimilating dozens of documents and huge amount of context and then being able to distill that into a more compact form, and then I need to turn that into a coding project and putting that into quad for coding.
And if that coding then needs another model to generate a creative thing, then I'm going to use that.
And so I think we're still at the point where there isn't a, okay, I'm going to be in the Claude camp or I'm going to be in the Gemini camp or OpenAI camp and that's all I'm going to use.
uh because even from even if we were using the benchmarks that uh what andy showed us showed clear different sections of what is being tested reasoning versus agentic coding versus you know creativity or or whatnot and each model seemed to or each grouping of models seem to be better at one section or another so yeah i i'm with you carl i don't think um
I don't think the benchmarks are...
terribly useful for people who just want to see what they can do with it.
I think as long as it's going up and to the right on a chart, that's what we want to see.
The closer the percentages get to 100 is kind of the direction we want to go in.
And knowing enough about what models are really good at any particular feature or subset and weaving those together might be the end result.
All until we can run our own version on our own hardware for our own purposes.
And then just send out queries whenever we absolutely need to.