Show HN: MiraTTS, a 48kHz Open-Source TTS at 100x Real-Time Speed (github.com)

0 points 199 days ago ago | visit original

🤖 AI Summary

MiraTTS has been introduced as a groundbreaking open-source text-to-speech (TTS) model, a finetuned version of the Spark-TTS model that significantly enhances audio realism and stability. What sets MiraTTS apart is its ability to generate high-quality 48kHz audio at an impressive speed of over 100x real-time, thanks to optimizations with Lmdeploy and FlashSR technology. It is designed to be memory-efficient, operating within 6GB of VRAM, and provides low latency of approximately 100ms, making it suitable for various applications demanding quick audio responses. This release is significant for the AI/ML community as it democratizes access to advanced TTS technology typically found in proprietary models. With simple installation and flexible usage, MiraTTS supports low-latency streaming and multilingual capabilities, expanding the horizons for developers in diverse fields, from entertainment to education. Additionally, the project includes comprehensive resources, including sample usage and optimization insights, facilitating deeper understanding and application of LLM TTS models. The commitment to open-source development encourages collaboration and further innovation in the TTS landscape.

Loading comments...

loading comments...