Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Pricing
Podcast Image

Two Voice Devs

Episode 253 - The Future of Voice? Exploring Gemini 2.5's TTS Model

29 Aug 2025

Description

In this episode of Two Voice Devs, Mark and Allen dive into the new experimental Text-to-Speech (TTS) model in Google's Gemini 2.5. They explore its capabilities, from single-speaker to multi-speaker audio generation, and discuss how it's a significant leap from the old days of SSML. They also touch on how this new technology can be integrated with LangChainJS to create more dynamic and natural-sounding voice applications. Is this the return of voice as the primary interface for AI?[00:00:00] Introduction[00:00:45] Google's new experimental TTS model for Gemini[00:01:55] Demo of single-speaker TTS in Google's AI Studio[00:03:05] Code walkthrough for single-speaker TTS[00:04:30] Lack of fine-grained control compared to SSML[00:05:15] Using text cues to shape the TTS output[00:06:20] Demo of multi-speaker TTS with a script[00:09:50] Code walkthrough for multi-speaker TTS[00:11:30] The model is tuned for TTS, not general conversation[00:12:10] Using a separate LLM to generate a script for the TTS model[00:13:30] Code walkthrough of the two-function approach with LangChainJS[00:16:15] LangChainJS integration details[00:19:00] Is Speech Markdown still relevant?[00:21:20] Latency issues with the current TTS model[00:22:00] Caching strategies for TTS[00:23:30] Voice as the natural UI for AI[00:25:30] Outro#Gemini #TTS #VoiceAI #VoiceFirst #AI #Google #LangChainJS #LLM #Developer #Podcast

Audio
Featured in this Episode

No persons identified in this episode.

Transcription

This episode hasn't been transcribed yet

Help us prioritize this episode for transcription by upvoting it.

0 upvotes
🗳️ Sign in to Upvote

Popular episodes get transcribed faster

Comments

There are no comments yet.

Please log in to write the first comment.