🤖 AI Summary
NanoWakeWord is an open-source, end-to-end framework that automates creation of high-performance custom wake-word models. It combines intelligent dataset analysis (auto-config) with one-command training: drop raw audio (MP3/M4A/FLAC, etc.) into prescribed folders and run a single CLI command to preprocess (ffmpeg required), synthesize missing positives/negatives, augment, extract features and train a model. The engine picks model architecture and hyperparameters based on dataset size and balance (supports DNN/LSTM/GRU/CNN/RNN), and outputs lightweight inference artifacts (.onnx/.tflite), with a polished terminal UI and a full config.yaml for expert control. Install via pip (nanowakeword or nanowakeword[train]) and note TF/TFLite conversion prefers Python ≤3.11 while .onnx workflows support 3.8–3.13.
For practitioners, NanoWakeWord lowers the barrier to building reliable always‑on wake-word systems by automating many laborious ML steps and optimizing for edge deployment (Raspberry Pi compatible). Recommended inputs are ~400+ positive samples and ≥3× negative audio, though the synthesizer helps small datasets; GPUs speed training but CPU is supported for smaller runs. The project emphasizes low false positives (typical <0.2 FP/hour) and fast convergence (80%+ recall early in training), and makes trade-offs like adaptive model complexity and noise-aware batching automatically. It’s Apache‑2.0 licensed, includes a pre-trained “Arcosoph” model on Hugging Face, and invites community contributions.
Loading comments...
login to comment
loading comments...
no comments yet