The Case for Learned Index Structures (dl.acm.org)

0 points 6 hours ago ago | visit original

🤖 AI Summary

Kraska et al. propose replacing traditional B‑trees and other index structures with machine-learned models that predict a key’s position in a sorted array. The core insight is that an index is just a function mapping keys to record ranks — essentially the cumulative distribution function (CDF) of the key distribution — so you can train a model (linear regressors, small neural nets) to approximate that CDF. The paper introduces the Recursive Model Index (RMI): a cascade of lightweight models where a top model routes a key to a lower model, and a final predicted position is refined with a short “last‑mile” binary search to guarantee correctness. This idea matters because it brings ML into a fundamental systems component, demonstrating substantial speed and space improvements on many workloads and opening a new design space for learned data structures. Key technical implications: RMIs trade generality for workload-awareness (they excel when key distributions are regular but need retraining for distribution shifts or heavy updates), require careful handling of range queries and concurrency, and rely on a hybrid ML+algorithm approach to preserve correctness (prediction + local search). The paper sparked a wave of research and practical questions about robustness, online updates, integration into DBMSs, and hardware-aware model choices, making learned indexes a pivotal example of ML for systems optimization.

Loading comments...

loading comments...