New SoTA open source TTS model from Boson AI (huggingface.co)

0 points 4 hours ago ago | visit original

🤖 AI Summary

Boson AI has announced the release of Higgs Audio v3, an open-source text-to-speech (TTS) model designed for voice chat applications. Unlike traditional TTS systems that primarily read text, Higgs Audio v3 generates expressive and conversational speech in over 100 languages. It features zero-shot voice cloning and allows users to control aspects such as emotion, style, prosody, and sound effects through inline tokens. The model is available for research and non-commercial use with a separate commercial license required for any revenue-generating applications. This new model is significant for the AI/ML community as it demonstrates advanced conversational capabilities, reaching a low word error rate (WER/CER) across a broad spectrum of languages, including both high-resource and low-resource dialects. The technical architecture includes a 4 billion parameter autoregressive decoder, and it utilizes a multi-codebook approach to effectively process and generate audio. The emphasis on various emotional outputs and styles opens new pathways for creating more engaging AI interactions. Users can run the model via a comprehensive API that facilitates real-time audio generation, positioning Higgs Audio v3 as a promising tool in the evolution of human-computer dialogue systems.

Loading comments...

loading comments...