The PGM-index (pgm.di.unipi.it)

🤖 AI Summary
The PGM-index is a learned index structure introduced by Paolo Ferragina and Giorgio Vinciguerra that combines compact, piecewise linear models with provable worst‑case guarantees to replace traditional B‑trees for sorted key lookup. Rather than storing every key, the PGM-index fits a sequence of simple linear models that map keys to their position in a sorted array within a user-chosen error bound ε, compressing storage and enabling fast point and range queries. The library and PVLDB 2020 paper describe a fully dynamic implementation that supports insertions and deletions while maintaining space efficiency and bounded lookup error. For the AI/ML community this matters because indexing and retrieval are core to large-scale data pipelines, embedding stores, feature tables, and model-serving systems where both latency and memory footprint are critical. The PGM-index offers a principled learned-index approach with theoretical worst‑case bounds on space and query performance, practical compression, and dynamic updates—making it an attractive replacement for heavyweight disk‑based indexes when keys exhibit patterns exploitable by simple models. Key technical points: piecewise linear approximation of the key-to-position map, configurable error ε controlling the space/time tradeoff, compressed storage of model parameters, and algorithms to update and rebalance the model sequence while preserving guarantees.
Loading comments...
loading comments...