Intelligence per Watt: A Study of Local Intelligence Efficiency (hazyresearch.stanford.edu)

🤖 AI Summary
A new large-scale study introduces "intelligence-per-watt" (IPW), a unified metric that measures task accuracy delivered per watt of power, and uses it to evaluate whether local language models (≤20B active parameters) and edge accelerators can realistically shoulder a meaningful share of today's AI demand. The authors benchmarked 20+ state-of-the-art local LMs across 3 local and 5 enterprise accelerators on 1 million real-world single-turn chat and reasoning queries (batch size = 1), measuring accuracy (LLM-as-a-judge for open-ended chat, exact match for ground-truth tasks) and power (NVML/powermetrics sampled at 50 ms). Key findings: local LMs now answer 88.7% of single-turn queries with accuracy improving 3.1× from 2023–2025; overall IPW for local setups improved 5.3× in that period (3.1× from model advances, 1.7× from hardware); but local accelerators (e.g., Qwen3-32B on an Apple M4 Max) still show ~1.5× lower IPW than enterprise-grade hardware (NVIDIA B200), indicating clear efficiency headroom. The study argues this efficiency trend could shift some inference from centralized data centers to billions of edge devices—freeing capacity for frontier workloads and enabling ubiquitous, low-latency AI in earbuds, glasses, and phones—if designers optimize for IPW across model and hardware stacks. Caveats include the single-turn focus (not evaluating long-horizon planning, tool use, or large-batch server optimizations), measurement noise in software-based power counters (≈10–15%), and the imperfect proxy of accuracy for “intelligence.” The authors release a profiling harness to track IPW as models and accelerators evolve, calling on the community to prioritize energy-to-intelligence efficiency.
Loading comments...
loading comments...