Omnilingual ASR: Advancing automatic speech recognition for 1600 languages (ai.meta.com)

0 points 15 hours ago ago | visit original

🤖 AI Summary

Meta FAIR has released Omnilingual ASR, an open-source suite of automatic speech recognition models and a new corpus that together bring transcription to more than 1,600 languages — including about 500 languages never before transcribed by AI. The release includes a massively multilingual wav2vec 2.0 foundation model scaled to 7 billion parameters, two decoding variants (a CTC-based decoder and a transformer “LLM‑ASR” decoder), and the Omnilingual ASR Corpus — a commissioned collection of transcribed, spontaneous speech in 350 underserved languages assembled with global partners. All code is Apache 2.0 and the data CC‑BY, built on FAIR’s fairseq2 framework. Technically, Omnilingual advances ASR by combining a large self‑supervised speech encoder with LLM-inspired in‑context learning so new languages can reach usable transcription quality with only a handful of paired examples (few‑shot), avoiding costly large-scale labeling. The 7B LLM‑ASR achieves state‑of‑the‑art results across 1,600+ languages (character error rate <10 for 78% of languages), and models are provided at sizes from 300M to 7B to suit low-power devices or high‑accuracy deployments. The release lowers the barrier for research and deployment in low‑resource languages, expands multilingual speech tooling beyond internet‑heavy tongues, and supplies the community with large, diverse training assets to accelerate inclusive speech technology.

Loading comments...

loading comments...