Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Pricing
Podcast Image

AI Breakdown

arxiv preprint - F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching

14 Oct 2024

Description

In this episode, we discuss F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching by Yushen Chen, Zhikang Niu, Ziyang Ma, Keqi Deng, Chunhui Wang, Jian Zhao, Kai Yu, Xie Chen. F5-TTS is a fully non-autoregressive text-to-speech system that utilizes flow matching with Diffusion Transformer (DiT) and addresses limitations of previous systems like E2 TTS by padding text inputs with filler tokens to match speech input lengths. It includes ConvNeXt for refining text representations and employs a new Sway Sampling strategy to enhance performance during inference without retraining. The system achieves a rapid inference real-time factor of 0.15 while providing high-quality speech synthesis, capable of zero-shot performance and code-switching, and is trained on a 100K-hour multilingual dataset with resources available for community use.

Audio
Featured in this Episode

No persons identified in this episode.

Transcription

This episode hasn't been transcribed yet

Help us prioritize this episode for transcription by upvoting it.

0 upvotes
🗳️ Sign in to Upvote

Popular episodes get transcribed faster

Comments

There are no comments yet.

Please log in to write the first comment.