🤖 AI Summary
Researchers introduced Paper2Video, a new benchmark and system for automatically generating academic presentation videos from research papers. The benchmark pairs 101 papers with author-created presentation videos, slides, and speaker metadata and introduces four tailored evaluation metrics—Meta Similarity, PresentArena, PresentQuiz, and IP Memory—to quantify how well generated videos convey a paper’s content. The team also released PaperTalker, the first multi-agent framework that automates the full pipeline of academic video production, aiming to cut the hours of manual slide design, recording, and editing normally required for short presentations.
PaperTalker combines slide generation with an “effective tree search” layout refinement to choose visual arrangements, cursor grounding to align pointer motion with narration, automated subtitling and speech synthesis, and talking‑head rendering to produce an aligned human presenter. It parallelizes slide-wise generation for efficiency. On the Paper2Video benchmark, outputs were judged more faithful and informative than existing baselines, suggesting practical utility for scaling scientific communication and improving accessibility. The dataset, evaluation suite, and code are available, providing a reproducible testbed for future work in multi-modal, aligned content generation for research dissemination.
Loading comments...
login to comment
loading comments...
no comments yet