Teaching AI to see the world more like we do (deepmind.google)

0 points 2 hours ago ago | visit original

🤖 AI Summary

Researchers publishing in Nature show that reorganizing a vision model’s internal representations to better match how humans group visual concepts makes those models more helpful, robust and generalizable. They compared human and model “perception” with a classic odd-one-out task: given three images, which two are most similar? While humans often pick pairs based on high-level concepts (e.g., vehicle-ness, biological category), models frequently rely on superficial cues like texture, background color or other low-level statistics. The team develops and tests a method to realign model embeddings with human-like structure and demonstrates that doing so reduces surprising errors and improves out-of-distribution performance. Technically, the work frames images as points in a high-dimensional embedding space and uses behavioral disagreement on the odd-one-out task to reveal where model geometry diverges from human judgment. By intervening on that geometry—reorganizing representation neighborhoods to reflect conceptual similarities—the models better capture shared attributes across visually different instances (e.g., grouping cars and airplanes by functional/common-material traits) and avoid brittle, texture-driven mistakes. This offers a practical path toward more intuitive, trustworthy vision systems and has clear implications for safer, more reliable applications in areas like autonomous driving, medical imaging and image search.

Loading comments...

loading comments...