Depth Anything 3 (depth-anything-3.github.io)

🤖 AI Summary
Depth Anything 3 (DA3) is a streamlined geometry model that predicts spatially consistent 3D geometry from any number of images, whether camera poses are provided or not. Rather than using bespoke 3D architectures or juggling multiple losses, DA3 demonstrates that a single plain transformer backbone (e.g., a vanilla DINOv2 encoder) plus a single depth-ray prediction target is sufficient. Trained with a teacher–student paradigm, DA3 matches or exceeds the level of detail and generalization of its predecessor (DA2) while keeping modeling minimal and modular. DA3 also establishes a new visual-geometry benchmark spanning camera-pose estimation, any-view geometry, and visual rendering, and achieves state-of-the-art performance across all tasks: it improves camera-pose accuracy over prior SOTA (VGGT) by an average of 35.7% and geometric accuracy by 23.6%, and it outperforms DA2 on monocular depth estimation. All models were trained exclusively on public academic datasets, underscoring reproducibility. The work implies that simpler, transformer-centric designs with focused prediction targets can yield strong, generalizable 3D perception—reducing engineering complexity while boosting accuracy for downstream tasks like SLAM, view synthesis, and robot perception.
Loading comments...
loading comments...