Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Damien Tanner

๐Ÿ‘ค Speaker
791 total appearances

Appearances Over Time

Podcast Appearances

The Changelog: Software Development, Open Source
The era of the Small Giant (Interview)

Most LLMs we use right now, the ones we use in coding agents, they're optimized for intelligence, not really speed.

The Changelog: Software Development, Open Source
The era of the Small Giant (Interview)

Then when people optimize for speed, the LLM labs, they tend to optimize for just token throughput.

The Changelog: Software Development, Open Source
The era of the Small Giant (Interview)

Very few people optimize for time to first token.

The Changelog: Software Development, Open Source
The era of the Small Giant (Interview)

And that's all that matters in voice, is I give you the user utterance,

The Changelog: Software Development, Open Source
The era of the Small Giant (Interview)

How long is the user gonna have to wait before I can start playing back an agent response to them?

The Changelog: Software Development, Open Source
The era of the Small Giant (Interview)

And time to first token, is that right?

The Changelog: Software Development, Open Source
The era of the Small Giant (Interview)

How long before I get the first kind of word or two that I can turn into voice and they can start hearing?

The Changelog: Software Development, Open Source
The era of the Small Giant (Interview)

The only major LLM lab that actually optimizes for this or maintains a low latency of TTFT is Google and Gemini Flash.

The Changelog: Software Development, Open Source
The era of the Small Giant (Interview)

OpenAI, most voice agents now doing it this way are either using GPT-4.0 or Gemini Flash.

The Changelog: Software Development, Open Source
The era of the Small Giant (Interview)

GPT-4.0 has got some annoying, the OpenAI endpoints have some annoying inconsistencies in latency.

The Changelog: Software Development, Open Source
The era of the Small Giant (Interview)

And that's kind of the killer in voice, right?

The Changelog: Software Development, Open Source
The era of the Small Giant (Interview)

It's a bad user experience if it works, you know, the first few turns of the conversation are fast and then suddenly the next turn the agent takes three seconds to respond and you're like...

The Changelog: Software Development, Open Source
The era of the Small Giant (Interview)

Is the agent broken?

The Changelog: Software Development, Open Source
The era of the Small Giant (Interview)

But then once you get that first token back, then you're good because then you can, you send that text to us, you start streaming text to us, and then we can start turning it into full sentences.

The Changelog: Software Development, Open Source
The era of the Small Giant (Interview)

And then again, we get to this batching problem.

The Changelog: Software Development, Open Source
The era of the Small Giant (Interview)

The voice models that do text to voice, again, they don't stream in the input.

The Changelog: Software Development, Open Source
The era of the Small Giant (Interview)

They require a full sentence of input, right?

The Changelog: Software Development, Open Source
The era of the Small Giant (Interview)

before they can start generating any output.

The Changelog: Software Development, Open Source
The era of the Small Giant (Interview)

Because again, how you speak, how things are pronounced depends on what comes later.