Damien Tanner
๐ค SpeakerAppearances Over Time
Podcast Appearances
And you connect up to the layer code voice layer and put in our browser SDK.
And then you get a little voice agent microphone button and everything in the web app.
We also connect to phone over Twilio.
And then for every turn of the conversation, whenever the users finish speaking, we ship your backend that transcript.
You call the LLM of your choice.
You do your tool calls, everything you need to do to generate a response like you normally do for a text agent.
Then you start streaming the response tokens back to us.
And then as soon as we get that first word, we start converting that text to speech and start streaming back to the user.
And so there's a bunch of stuff you have to do to make that really low latency, make that a real-time conversation where you're not waiting more than a second or two for the agent to respond.
So we put a lot of work into refining that.
And there's also a lot of exciting innovation happening in the model space for voice models, whether it's the transcription or the text-to-speech.
And so we give you the freedom to switch between those models so you can try out some of the different voice models, some that are really
really cheap and really, you know, got really casual voices and some like 11 labs that are much more expensive, but they're very professional, clean voices.
And you can find the right for your kind of experience that you want trade-off.
There's a lot of trade-offs, right?
In voice between latency, price, quality.
So we let users explore that and find the right fit for their voice agent.
Yeah.
We handle everything related to the voice basically.
And we let you just handle text, like a text chat bot basically.