Show HN: I Built a Transcription CLI Because Uploading 4GB Videos Was Killing Me (medium.com)

0 points 6 hours ago ago | visit original

🤖 AI Summary

Developer Illyism released a tiny open-source CLI, @illyism/transcribe, that slashes the pain of transcribing large videos by extracting and optimizing audio locally before sending it to OpenAI’s Whisper API. Instead of uploading multi-gigabyte MP4s, the tool uses FFmpeg to rip audio (seconds), speeds it up 1.2× and—if needed—applies Opus compression to stay under Whisper’s ~25MB file limit, then uploads only a ~10–30MB file for transcription. It handles local files or YouTube URLs (via yt-dlp), auto-adjusts timestamps back to the original timeline, and emits SRT subtitles. The package is TypeScript/Node, tiny (312KB), available via npx/npm, and MIT-licensed on GitHub. Why it matters: in A/B tests a 2.7GB, 22-minute video used 99.5% less bandwidth, ran about 9% faster and kept ~98% transcription accuracy compared to naive upload workflows—saving time and bandwidth for podcasters, creators, and devs who routinely work with long-form video. The design highlights practical ML deployment principles: local preprocessing to reduce I/O, light audio optimizations that preserve model accuracy, and robust tooling (yt-dlp vs ytdl-core) to improve reliability. It’s an immediately usable, low-cost ($0.006/min via Whisper) approach to make large-scale transcription workflows far more efficient.

Loading comments...

loading comments...