Voxscribe: STT Models Comparison Platform (github.com)

🤖 AI Summary
VoxScribe is a lightweight, open-source platform that unifies testing and comparison of multiple STT models behind a single FastAPI backend and clean web UI. Aimed at startups and enterprises facing steep costs from proprietary transcription, it enables side-by-side evaluation of models such as OpenAI Whisper, Mistral Voxtral, NVIDIA Parakeet and Canary Qwen-2.5B, letting teams upload audio and compare outputs, timestamps, and export results (CSV/text). Key user-facing features include model caching, drag-and-drop audio with preview, real-time status updates, and REST endpoints for /api/transcribe, /api/compare, /api/models and /api/status so it can be integrated into existing pipelines. Technically, VoxScribe tackles common deployment pain points: automatic dependency/version conflict resolution (notably reconciling transformers 4.56.0+ required by Voxtral vs 4.51.3 for NeMo models), background task processing, GPU-aware setup (CUDA/NVIDIA drivers, recommended AWS g6.xlarge), and model management with easy extensibility. It ships with install buttons for missing packages, conda/ffmpeg guidance, a simple project layout (backend.py, public/ frontend, run.py) and a Dockerfile for containerized runs. The platform lowers the barrier to production-grade STT evaluation—reducing cost, accelerating model selection, and simplifying integration—while still requiring attention to CUDA, disk space, and model download constraints.
Loading comments...
loading comments...