Show HN: Thumbnail Bench 1.0 (tubesalt.com)

🤖 AI Summary
Thumbnail Bench 1.0 is a human-evaluated benchmark that measures how well text-to-image models generate YouTube-style thumbnails from real production TubeSalt templates. Each model was given identical prompts and ran with default API settings; outputs were scored by humans across 10–15 practical criteria (anatomical accuracy, skin quality, text/graphics quality, spelling, legibility, composition, framing, and prompt-matching). Scores report average performance across multiple template generations (AVG@10), producing a leaderboard that ranks models by how “thumbnail-ready” their outputs are. The results show Google’s Imagen 4 Preview leading at 90.7%, followed by Black Forest Labs’ Flux Pro Kontext Max (88.9%) and Tencent’s Hunyuan Image V3 (88.2%); other contenders include Flux Pro Kontext (86.0%) and ByteDance’s Seedream V4 (81.8%). Significance: this benchmark prioritizes production concerns—text readability, accurate anatomy, and faithful prompt-following—rather than generic perceptual metrics, so it’s directly actionable for creators and platforms automating thumbnail generation. Practical implications include model selection for end-to-end pipelines, the value of text-rendering and composition capabilities, and the need for post-processing or fine-tuning to fix spelling and framing. Caveats: results reflect default APIs and a fixed prompt/template set and are subject to human evaluator variance, so performance may shift with prompt engineering, template diversity, or model updates.
Loading comments...
loading comments...