The Death of the Demo (lielvilla.com)

🤖 AI Summary
AI demos have gone viral so often that Liel Villa calls out a growing gap between curated clips and production reality. After building WonderPods (AI-generated kids’ podcast episodes), Villa found that only about 3–10% of outputs are “incredible,” 70–80% mediocre, and 10–30% outright bad. High-profile text-to-speech models show consistent, repeatable failure modes at scale: mid-sentence artifacts, volume drift, inexplicable speed changes, and wrong pronunciations. Fixes from vendors—raising a “stability” parameter or splitting narration into short segments—reduce errors but flatten expressiveness or break continuity and throughput, revealing tradeoffs that polished demos hide. Villa argues the community needs robust, production-oriented benchmarks for TTS like those used for LLMs. Proposed metrics include LUFS variance for volume stability (95th–5th percentile LUFS across 10s segments), coefficient of variation of WPM over 15s windows for pacing, Word Error Rate via ASR on challenging text for pronunciation, and a synthetic-detection score from a standardized classifier to quantify “human-ness.” Crucially, these must be measured across hundreds or thousands of diverse generations, not single curated clips. The thesis: to move generative audio from viral demos to dependable products, we must measure consistency and failure modes systematically rather than hype isolated highlights.
Loading comments...
loading comments...