Echogarden is an easy-to-use speech toolset of speech processing tools (github.com)

0 points 13 hours ago ago | visit original

🤖 AI Summary

Echogarden is a new open-source (GPL v3) speech toolset written in TypeScript for Node.js that bundles a full stack of speech-processing capabilities into a single, easy-to-install npm package and CLI. It runs cross-platform on x64 and ARM64 Windows, macOS and Linux without Python, Docker or platform-specific binaries—engines are implemented in pure TypeScript, ported via WebAssembly, or run through ONNX. Install with npm (Node 18+ recommended) and use it either as a global CLI or import it directly into applications; an internal package system auto-downloads voices and models on demand. Technically rich and practical, Echogarden ships offline high-quality TTS (Kokoro, VITS, eSpeak-NG) and 16+ other engines including cloud providers, plus a TypeScript/ONNX port of Whisper, whisper.cpp and other STT engines. It supports speech-to-text, speech-to-translation (Whisper’s 98 languages → English with near word-level timing), speech-transcript alignment (DTW variants and guided decoding), VAD, denoising (RNNoise/NSNet2), source separation (MDX-NET), word-level timestamps and subtitle generation. Developers can enable ONNX CUDA providers for GPU acceleration. By consolidating offline and cloud engines into a JS-first, dependency-light package, Echogarden lowers the barrier for integrating advanced speech workflows into web, server and edge applications.

Loading comments...

loading comments...