Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Damien Tanner

๐Ÿ‘ค Speaker
791 total appearances

Appearances Over Time

Podcast Appearances

The Changelog: Software Development, Open Source
The era of the Small Giant (Interview)

We didn't touch on it, but interruptions is this other really difficult dynamic part where whilst the agent is speaking its response to you, if the user starts speaking again, you then need to decide in real time whether the user is interrupting the agent.

The Changelog: Software Development, Open Source
The era of the Small Giant (Interview)

Or are they just going, mm-hmm, yeah, and agreeing with the agent?

The Changelog: Software Development, Open Source
The era of the Small Giant (Interview)

Or are they trying to say, ooh, stop?

The Changelog: Software Development, Open Source
The era of the Small Giant (Interview)

I bet that's a hard problem to solve.

The Changelog: Software Development, Open Source
The era of the Small Giant (Interview)

We have to still be transcribing audio even when the user's hearing it.

The Changelog: Software Development, Open Source
The era of the Small Giant (Interview)

And we've got to deal with background noise and everything.

The Changelog: Software Development, Open Source
The era of the Small Giant (Interview)

And then when we're confident the user is trying to interrupt the agent, we've then got to do this whole kind of state change where we tear down all of this in-flight LLM request, in-flight voice generation request,

The Changelog: Software Development, Open Source
The era of the Small Giant (Interview)

And then as quickly as possible, start focusing on the user's new question.

The Changelog: Software Development, Open Source
The era of the Small Giant (Interview)

And especially if their interruption is really short, like stop.

The Changelog: Software Development, Open Source
The era of the Small Giant (Interview)

Suddenly you've got to tear down all the old stuff, transcribe that word stop, then ship that as a new LLM, request to the backend, generate the response, and then get the agent speaking back as quickly as possible.

The Changelog: Software Development, Open Source
The era of the Small Giant (Interview)

It's all happening down one pipe, as it were, at the end of the day.

The Changelog: Software Development, Open Source
The era of the Small Giant (Interview)

It's like audio from the browser, microphone, and then audio replaying back.

The Changelog: Software Development, Open Source
The era of the Small Giant (Interview)

And we would have bugs like you'd interrupt the agent, but then...

The Changelog: Software Development, Open Source
The era of the Small Giant (Interview)

When it started replying, there'd still be a few chunks of 20-minute-a-second audio from the old response snuck in there.

The Changelog: Software Development, Open Source
The era of the Small Giant (Interview)

Or, you know, the old audio would be interleaved with the new audio from the agent back.

The Changelog: Software Development, Open Source
The era of the Small Giant (Interview)

And you're kind of in the, you know, audacity or something, some audio editor trying to work out like, what's going, why does it sound like this?

The Changelog: Software Development, Open Source
The era of the Small Giant (Interview)

And you're like rearranging bits of audio going, ah, okay.

The Changelog: Software Development, Open Source
The era of the Small Giant (Interview)

The responses are taking turns every 20 milliseconds.

The Changelog: Software Development, Open Source
The era of the Small Giant (Interview)

It's interleaving the two responses to try and work out what's going on.