Teaching robots to map large environments (news.mit.edu)

🤖 AI Summary
MIT researchers led by Dominic Maggio, Hyungtae Lim and Luca Carlone announced a new SLAM system that lets vision-based robots build accurate, large-scale 3D maps in seconds by processing arbitrarily many images. Instead of trying to reconstruct a whole scene at once (current learned models can only handle ~60 images), their pipeline incrementally builds small submaps from short image batches, then stitches them together while estimating camera poses in real time. The result is fast, scalable mapping that runs without calibrated cameras or hand-tuned systems—crucial for time-sensitive use cases like search-and-rescue in disaster zones, XR headsets, and warehouse robots. Technically, the key insight is blending modern learning-based depth/pose estimation with classical geometric optimization: the team devised a more flexible alignment model that represents and corrects the deformations introduced by learned submaps (not just rigid rotation/translation), enabling consistent stitching. Tested on complex scenes (e.g., MIT Chapel from cellphone video), the system produced near–real-time 3D reconstructions with average errors under 5 cm and outperformed other methods in speed and accuracy. The work, to be presented at NeurIPS, shows that coupling learned perception with principled geometry makes high-quality, scalable SLAM practical for real-world robots.
Loading comments...
loading comments...