Déjà View: Looping Transformers for Multi-View 3D Reconstruction (research.nvidia.com)

0 points 3 hours ago ago | visit original

🤖 AI Summary

Déjà View, a novel approach for multi-view 3D reconstruction, introduces a looping transformer block that refines camera poses and dense geometry from multiple views through iterative processing. This architecture, which consists of only 117 million parameters, allows users to adjust computational requirements during inference. Notably, Déjà View outperforms much larger feed-forward models while using 8-10 times fewer parameters and requiring significantly less computational power, demonstrating its effectiveness across five diverse benchmarks including both indoor and outdoor scenes. The significance of Déjà View lies in its innovative use of explicit iteration within the transformer architecture, which provides a stronger inductive bias for multi-view reconstruction than traditional models. By employing a single transformer block recurrently for optimal refinement, it not only streamlines the reconstruction process but also challenges established notions about scale and complexity in AI models. The technique retains high performance, offering a compelling alternative to large, resource-intensive models in the rapidly evolving field of AI and computer vision.

Loading comments...

loading comments...