Automatic Video Generation from Scientific Papers (showlab.github.io)

🤖 AI Summary
Academic presentation videos are time-consuming to produce but crucial for research dissemination. The authors introduce Paper2Video, a new benchmark of 101 research papers paired with author-created presentation videos, slides, and speaker metadata (avg. 13.3K words, 44.7 figures, 28.7 pages per paper; ~16 slides and 6m15s video length). Recognizing that presentation videos are judged by how well they communicate scholarship rather than by visual fidelity alone, they propose four tailored evaluation metrics—Meta Similarity, PresentArena, PresentQuiz, and IP Memory—that separately measure audience comprehension and author-aligned fidelity for long-horizon, multimodal tasks integrating text, figures, slides, speech, and a human talker. Building on this benchmark, they present PaperTalker, the first multi-agent pipeline that automatically generates academic presentation videos from papers. The system decomposes the task into four builder agents (slide generation with layout refinement via an “effective tree search” visual choice, cursor grounding, subtitling and speech synthesis, and talking-head rendering) and parallelizes slide-wise generation for efficiency. Experiments on Paper2Video show PaperTalker produces videos that are measurably more faithful and informative than existing baselines, marking a practical step toward scalable, ready-to-use academic video generation and enabling automated scholarly communication at scale.
Loading comments...
loading comments...