Larger Than RAM Vector Indexes for Relational Databases (planetscale.com)

πŸ€– AI Summary
PlanetScale announced a native MySQL vector index design that tackles the practical problems of running approximate nearest-neighbor (ANN) search inside a relational database β€” specifically supporting indexes that are much larger than RAM while preserving transactional guarantees. The team found little prior research about the trade-offs required by real-world databases (sharding, InnoDB-managed on-disk datasets, continuous inserts/updates/deletes, crash recovery), so they built novel solutions to make vector indexes behave like ordinary SQL indexes: committed vectors are immediately visible, aborted transactions don’t pollute the index, and durability/isolation are honored even during failover. Technically, they start from HNSW (a multi-layer graph ANN standard) but address its two big mismatches with DB usage: HNSW usually assumes the whole graph fits in RAM and is mostly static. PlanetScale supports an in-memory HNSW variant (high performance, ~99.9% recall if you size RAM to the dataset) and β€” crucially β€” a new hybrid disk-backed index that allows terabyte-scale datasets to live under MySQL without giving up ACID semantics. That hybrid design accepts performance trade-offs versus pure in-memory HNSW but ensures transactional consistency, crash resilience, and incremental updates, making vector search practical and predictable for production relational databases.
Loading comments...
loading comments...