Damien Tanner
๐ค SpeakerAppearances Over Time
Podcast Appearances
I mean, you can stream the input, but like you need the complete thing
the complete question to send to the LLM to then make a request to the LLM to start generating a response, right?
There is no duplex LLM that takes input and generates output at the same time.
Technically.
Yeah, yeah, yeah.
So we can do that.
in terms of like, because we have the partial transcripts, yeah.
So we can stream you the partial transcripts and then say, okay, now it's done.
Now make the LLM call.
Then you make the LLM call.
But interesting, sending text is actually super fast in the context of voice conversation, right?
And actually the default example is crazy.
I didn't think this would work until we tried it, but it just uses a webhook.
When the user finishes speaking, the basic example sends your Next.js API route a webhook with the user text.
And turns out the webhook, sending webhook with a few sentences in it, that's like, that's fine.
That's fast.
It's all the other stuff like then waiting for the LLM to respond.
Yeah, and we've got a WebSocket endpoint now, so we can kind of shave off that HTTP connection and everything.
But yeah, then the big heavy latency items come in.
So generating an LLM response.