Benchmark code for evaluating different ASR packages and APIs (github.com)

0 points 1 day ago ago | visit original

🤖 AI Summary

The Open ASR Leaderboard repository supplies ready-to-run benchmark code and a Gradio Space (hf-audio/open_asr_leaderboard) to compare automatic speech recognition systems across datasets and measure both accuracy (Word Error Rate) and runtime efficiency (Inverse Real-Time Factor, RTFx). It standardizes evaluation: each library gets a directory with a run_eval.py entrypoint and a model-specific bash wrapper that produces JSONL prediction manifests, computes WER and RTFx, and records per-sample timings. The repo supports streaming datasets, warm‑up runs, and enforces consistent decoding hyperparameters across datasets to keep comparisons fair. Technical notes for reproducibility and contributors: use Python 3.10+, install PyTorch per official instructions, and log in with the Hugging Face CLI. NeMo requires CUDA 12.6 due to an RNN-T inference driver issue. Official results were produced on an NVIDIA A100-SXM4-80GB with driver 560.28.03, CUDA 12.6, and PyTorch 2.4.0, so equivalent setups are recommended when submitting results (maintainers can run evaluations on request). Adding a new library entails forking, adding a directory, adapting the provided template (model loading and inference steps), creating a run_<model_type>.sh script, and submitting a PR. The template shows a Transformers/Whisper example using bfloat16, processor-based preprocessing, model.generate, and postprocessing with normalization—making it straightforward to plug in other ASR stacks while preserving consistent data loading and manifest formatting.

Loading comments...

loading comments...