I built an open source VAD that beats Silero, Pyannote, and WebRTC (github.com)

🤖 AI Summary
A new open-source Voice Activity Detector (VAD) called NOVA-VAD has been launched, boasting superior performance over established models like Silero, Pyannote, and WebRTC, especially in noisy environments. This model is designed to be lightweight, requiring no GPU or PyTorch, and offers explainability for its decisions—essential features that have historically been at odds in VAD technology. Test results on real-world audio datasets show NOVA-VAD achieving a staggering accuracy of 93% while being fully retrainable and providing confidence scores for its predictions. The significance of NOVA-VAD lies in its successful integration of accuracy, lightness, and explainability, addressing long-standing limitations of existing VADs. It employs a sophisticated pipeline that includes a denoiser, over 150 audio features, and an ensemble classification approach combining Random Forest and Gradient Boosting techniques. Each prediction is complemented by explanations in plain English, detailing the factors behind the decision. This innovation could greatly enhance real-time speech processing applications, particularly in edge computing environments, making NOVA-VAD a noteworthy advancement in AI and machine learning for speech technology.
Loading comments...
loading comments...