🤖 AI Summary
Developer Illyism released a tiny open-source CLI, @illyism/transcribe, that slashes the pain of transcribing large videos by extracting and optimizing audio locally before sending it to OpenAI’s Whisper API. Instead of uploading multi-gigabyte MP4s, the tool uses FFmpeg to rip audio (seconds), speeds it up 1.2× and—if needed—applies Opus compression to stay under Whisper’s ~25MB file limit, then uploads only a ~10–30MB file for transcription. It handles local files or YouTube URLs (via yt-dlp), auto-adjusts timestamps back to the original timeline, and emits SRT subtitles. The package is TypeScript/Node, tiny (312KB), available via npx/npm, and MIT-licensed on GitHub.
Why it matters: in A/B tests a 2.7GB, 22-minute video used 99.5% less bandwidth, ran about 9% faster and kept ~98% transcription accuracy compared to naive upload workflows—saving time and bandwidth for podcasters, creators, and devs who routinely work with long-form video. The design highlights practical ML deployment principles: local preprocessing to reduce I/O, light audio optimizations that preserve model accuracy, and robust tooling (yt-dlp vs ytdl-core) to improve reliability. It’s an immediately usable, low-cost ($0.006/min via Whisper) approach to make large-scale transcription workflows far more efficient.
Loading comments...
login to comment
loading comments...
no comments yet