Sketch-Based Cross-Modal Retrieval Model for Building Localization (www.mdpi.com)

0 points 6 days ago ago | visit original

🤖 AI Summary

Researchers introduced SLIC (Sketch Line Information Consistency), a sketch-based cross-modal retrieval model designed to localize buildings by matching freehand architectural sketches to real-world photos. The approach targets semantic localization without relying on satellites, enabling low-cost, robust navigation in GPS-denied or indoor/outdoor mixed environments. By focusing on line and contour cues—natural to quick sketches—SLIC aims to close the domain gap between hand-drawn inputs and photographic images, making sketch-driven retrieval practical for users without drawing expertise. Technically, SLIC is a two-branch CycleGAN-style generative framework that learns mapping functions to a shared semantic space for sketches and images. VGG-16 extracts features, and a line-attention network highlights texture/line cues; L-A (line-attention) blocks enforce line-style consistency during training while R-A blocks model deeper cross-modal relations during retrieval. The model uses adversarial and cycle-consistency objectives but mediates consistency via line features rather than raw pixel domains. Grad-CAM visualizations informed sketch simplification rules and dataset collection (non-artists) to improve generality. Ablation studies and retrieval experiments across architectural datasets show the L-A/R-A design improves discrimination between visually similar buildings, indicating SLIC’s promise for precise, explainable sketch-based localization in AI/ML and location-tech applications.

Loading comments...

loading comments...