Show HN: Namo Turn Detector v1 – High-performance, semantic turn detection (github.com)

0 points 5 hours ago ago | visit original

🤖 AI Summary

VideoSDK’s Namo-v1 is an open-source suite of semantic turn-detection models that decide when a speaker has finished an utterance — a core problem for natural, low-latency conversational agents. Instead of relying on silence heuristics, Namo applies NLU to distinguish complete vs. incomplete utterances, cutting interruptions and perceived latency in voice UIs. The collection includes lightweight specialized (DistilBERT-based) models (~135 MB, <19 ms inference) and a unified multilingual model (mmBERT base, ~295 MB, <29 ms inference). Reported peak accuracies reach 97.3% for specialized models and the multilingual model averages 90.25% across 23 languages evaluated on 25,000+ utterances, with top per-language scores for Turkish, Korean, Japanese, German and Hindi. Namo is production-ready: ONNX-quantized builds give a ~2.19× speedup (inference down from ~61.3 ms to ~28 ms) with negligible accuracy loss and doubled throughput. Integration is plug-and-play via VideoSDK Agents (examples and an inference script provided), and each model includes Colab notebooks for fine-tuning and evaluation. Licensed under Apache 2.0 and hosted on Hugging Face, Namo-v1 is useful for anyone building real-time multilingual voice agents or research into dialogue turn management, offering a practical, low-latency semantic approach that’s easy to test and adapt.

Loading comments...

loading comments...