🤖 AI Summary
Vāgdhenu, an innovative open-source text-to-speech (TTS) system, has been launched to automatically detect meter in Sanskrit verses input in any Indian script and convert them into chanted recitations. This tool is significant for the AI/ML community as it combines advanced neural TTS technology with linguistic intricacies of Sanskrit, showcasing how machine learning can cater to the unique phonetic and metric characteristics of a specific language. The initial chant duration varies between 10 to 60 seconds as the model warms up, but its impressive capabilities are evident in its voice mapping and meter detection.
Built on a meticulously designed corpus of around five hours of solo Sanskrit chants, Vāgdhenu employs a flow-matching TTS backbone that has been retrained to enhance its ability to render complex phonetic elements. Key technical features include a script-aware frontend, meticulous handling of linguistic rules, and a vṛtta-aware mechanism that recognizes metrical patterns. The model achieved a mean opinion score (MOS) of about 4.6, indicating high quality in speech synthesis, even for challenging conjuncts and tonal variations. Vāgdhenu has successfully produced two extensive corpora, enriching the digital landscape for Sanskrit literature and making it more accessible through technology.
Loading comments...
login to comment
loading comments...
no comments yet