Show HN: Python Audio Transcription: Convert Speech to Text Locally (www.pavlinbg.com)

0 points 11 hours ago ago | visit original

🤖 AI Summary

A developer built a compact Python audio-transcription toolkit that runs OpenAI’s Whisper locally so you can convert hours of speech to text without uploading sensitive audio to third‑party services. The post provides a ready-to-run AudioTranscriber class (whisper.load_model, transcribe_file, language detection, segment timestamps, save_transcription), a CLI wrapper, batch-processing and SRT subtitle generation, plus an alternative pipeline using speech_recognition + pydub. The author emphasizes FFmpeg as a critical dependency and recommends creating a virtualenv and installing openai-whisper via pip. Why it matters: local Whisper gives researchers, journalists and creators a privacy-preserving, cost-free transcription option with production-ready features (auto language detection, segmentation, subtitle output). The writeup highlights practical tradeoffs across Whisper model sizes (tiny→large) with RAM, speed and accuracy numbers (e.g., base ≈3.8 min/hr, 94% accuracy; small ≈10 min/hr, 96%; medium ≈30 min/hr, 97%), hardware guidance (8–16GB RAM suggestions, GPU 3–5× speedup), and common failure modes. It also includes actionable fixes: use smaller models or chunk long audio to avoid OOM, preprocess/noise-reduce audio (normalize, high‑pass filter), and specify language to boost accuracy. Overall, it’s a concise, runnable blueprint for secure, high‑quality offline transcription.

Loading comments...

loading comments...