Show HN: Generate storyboards from YAML with Gemini (image and TTS) (github.com)

0 points 225 days ago ago | visit original

🤖 AI Summary

A new open-source CLI called "storyboard" (Show HN) lets you author image + audio storyboards from declarative YAML and generate assets with Google’s Gemini models. You define characters (with reference photos and TTS voice/style), reusable image and TTS templates, and scenes made of frames that fill template variables and reference other YAML entries. Workflows are simple: storyboard init (asks for project name + Gemini API key), storyboard generate to produce images/tts into output/, storyboard serve to view interactively, and storyboard composite movie to stitch frames into a single MP4. There are convenience subcommands to generate a single image or TTS clip, and to selectively update/regenerate specific assets. Technically it’s built around Gemini image and TTS models by default (gemini-3-pro-image-preview and gemini-2.5-flash-preview-tts), supports reference photos, variable substitution, cross-references (e.g., @characters._nick.reference_photo), caching (SHA of prompt+model) to avoid re-generation, and cache dirs (.storyboard/generated/images, .storyboard/generated/audio). Configurable options include concurrency, timeouts, retry behavior, image/audio optimization, frame timing rules (use audio length or default 5s), and voice IDs (e.g., Fenrir, Aoede). For creators and ML practitioners this offers a reproducible, template-driven pipeline for rapid audiovisual prototyping using LMM-generated media, while exposing practical trade-offs around API keys, compute/concurrency limits, caching, and media optimization.

Loading comments...

loading comments...