🤖 AI Summary
Hume AI announced Octave 2, the second-generation "speech‑language" text‑to‑speech engine now in preview via platform and API. Octave 2 extends multilingual support to 11 languages (including Arabic, English, French, German, Hindi, Japanese, Korean, Portuguese, Russian, Spanish, and Italian), runs ~40% faster than its predecessor with sub‑200ms generation, and costs half as much as Octave 1 (dedicated deployments can push cost under $0.01/min). Architecturally it’s a speech‑language model that jointly models text and audio prosody, so it better captures emotional tone, timbre, timing and reliably pronounces rare words, numbers and symbols. Hume also cites a bespoke inference stack built with Sambanova and deployment on advanced LLM inference chips for the latency/efficiency gains.
The release surfaces two first‑of‑their‑kind capabilities for generative speech systems: realistic voice conversion (swap voices while preserving phonetic timing) and direct phoneme editing (fine‑grained control over pronunciation and emphasis)—useful for dubbing, actor stand‑ins, game dialogue, and post‑production touch‑ups. Octave 2 supports 15‑second “instant cloning” samples and predicts accent transfer cross‑language; more languages and evaluations are promised soon. Hume also launched EVI 4 mini to bring Octave 2 into a speech‑to‑speech API (requires pairing with an external LLM today). The combination of low latency, lower cost, multilingual fidelity and granular phoneme control marks a notable step toward scalable, emotionally expressive voice agents and production workflows.
Loading comments...
login to comment
loading comments...
no comments yet