Corey Knowles
๐ค SpeakerVoice Profile Active
This person's voice can be automatically recognized across podcast episodes using AI voice matching.
Appearances Over Time
Podcast Appearances
We just saw a model drop a couple of days ago where in order to get it two and a half times faster, the cost six X.
Well, I've got to say at a dollar per million output tokens, you seem to be doing okay with that.
I keep thinking when we talk like GPUs and what it takes to do the autoregressive approach, you know, this kind of...
in a lot of ways could be a smart way to sort of side skirt things like the current memory supply issue, the need to go acquire brink trucks of money and back them up at Jensen Wong's patio door.
You know, I really think that this is an interesting approach at a prime time for that.
That makes sense.
That makes sense.
So how does this behave with long context?
Are we looking at it getting specifically any more expensive or is parallelism playing more of a role as they grow?
You know, I would say probably outside of most, you know, maybe enterprise and software applications, but you're dealing with what say the average worker uses a hundred K context is, is plenty for most things.
You can really do a lot in that range.
Yeah.
I was, I was kind of wondering how coherence.
plays together by not going left to right.
And I guess that's me thinking of it through how my human brain works.
Same as with, if I'm running, you know, stable diffusion and comfy UI or something, and you're doing your denoising runs.
I've been eyeballing how to connect it to my open clock.
Oh my gosh.
Excellent.
Excellent, excellent.