We're (Still) Not Giving Data Enough Credit (www.lukew.com)

🤖 AI Summary
At UC Berkeley’s Alexei Efros’ Sutter Hill talk, he argued that data — not algorithms — is the primary driver of progress in visual AI. Efros traces a longstanding “algorithm-first” bias in academia that downplays dataset design and scale. He gives concrete examples: three classic face-detection papers reached similar performance once negative (non-face) examples were included, and his team’s image hole-filling worked using a brute-force nearest-neighbor lookup over 2 million Flickr images. When methods are compared on identical datasets, sophisticated neural nets often perform comparably to simple lookup-based approaches — suggesting many advances are interpolation enabled by rich data rather than novel algorithmic insight. The talk reframes AI as a cultural, data-compression technology: human perception itself is highly data-driven (humans recall meaningful natural images far better than random textures), and modern models act as “distillation machines” that compress civilization’s experiences. Technical implications: breakthroughs track data availability — text benefits from vast corpora, images are catching up, while video and robotics lag because of data scarcity. Efros recommends the practical startup test: “Is there enough data for this problem?” He also nudges a deeper question — true machine autonomy may require bootstrapping beyond human artifacts, not just better interpolation of them.
Loading comments...
loading comments...