Parakeet.cpp – Parakeet ASR inference in pure C++ with Metal GPU acceleration (github.com)

0 points 116 days ago ago | visit original

🤖 AI Summary

NVIDIA has launched Parakeet.cpp, a pure C++ implementation of Automatic Speech Recognition (ASR) leveraging its Parakeet models, enhanced by Metal GPU acceleration. This innovation is significant as it eliminates the need for heavyweight dependencies like ONNX or Python runtimes, resulting in a streamlined, efficient framework for developers. With a remarkable performance boost, the system achieves about 27ms encoder inference on Apple Silicon for 10 seconds of audio, making it 96 times faster than CPU alternatives. Parakeet.cpp supports various ASR models, including multi-lingual options and configurations for streaming and offline applications. The architecture is built on a shared FastConformer encoder, which employs automatic Metal optimization via the axiom tensor library. Importantly, users can switch between decoders (CTC and TDT) at runtime for better accuracy based on their needs, along with features like word-level timestamps and speaker diarization for up to four speakers. By significantly enhancing inference speed and providing extensive configurability, Parakeet.cpp positions itself as a powerful tool for real-time speech applications, catering to the escalating demand for efficient, high-performance AI solutions in speech processing.

Loading comments...

loading comments...