🤖 AI Summary
The Open ASR Leaderboard repository supplies ready-to-run benchmark code and a Gradio Space (hf-audio/open_asr_leaderboard) to compare automatic speech recognition systems across datasets and measure both accuracy (Word Error Rate) and runtime efficiency (Inverse Real-Time Factor, RTFx). It standardizes evaluation: each library gets a directory with a run_eval.py entrypoint and a model-specific bash wrapper that produces JSONL prediction manifests, computes WER and RTFx, and records per-sample timings. The repo supports streaming datasets, warm‑up runs, and enforces consistent decoding hyperparameters across datasets to keep comparisons fair.
Technical notes for reproducibility and contributors: use Python 3.10+, install PyTorch per official instructions, and log in with the Hugging Face CLI. NeMo requires CUDA 12.6 due to an RNN-T inference driver issue. Official results were produced on an NVIDIA A100-SXM4-80GB with driver 560.28.03, CUDA 12.6, and PyTorch 2.4.0, so equivalent setups are recommended when submitting results (maintainers can run evaluations on request). Adding a new library entails forking, adding a directory, adapting the provided template (model loading and inference steps), creating a run_<model_type>.sh script, and submitting a PR. The template shows a Transformers/Whisper example using bfloat16, processor-based preprocessing, model.generate, and postprocessing with normalization—making it straightforward to plug in other ASR stacks while preserving consistent data loading and manifest formatting.
Loading comments...
login to comment
loading comments...
no comments yet