Postgres vector database extensions - A Benchmark (seanpedersen.github.io)

0 points 6 hours ago ago | visit original

🤖 AI Summary

A crowd-sourced benchmark compared three Postgres vector extensions—pgvector, PGVectorScale (pgvectorscale) and VectorChord—using 450K 1024‑dim float32 text embeddings to evaluate latency, recall@100, index build time and storage. The test shows that approximate nearest-neighbor (ANN) indices in Postgres can deliver large speedups over brute force while keeping ACID, SQL filtering and existing data tooling. Measured outcomes: VectorChord’s vchordrq (IVF + RaBitQ) matched brute‑force recall (100%) with ~469 ms query latency (≈3x speedup), 1,383 s build time and 2,229 MB index. pgvector’s HNSW also hit 100% recall but used more RAM (611 ms latency, 3,555 MB). IVFFlat in pgvector gave the best 3.4x speedup at 412 ms but has known degradation if not rebuilt after many inserts/deletes. PGVectorScale’s DiskANN variant delivered the fastest queries (6.4 ms) and smallest index (254 MB) but catastrophically low recall (2%) and a single‑threaded index build, plus SSD requirements. Significance: the results reinforce that extending Postgres is a pragmatic default for most ML/production use cases where complex metadata filtering and ACID semantics matter. VectorChord stood out for balanced accuracy, performance and developer UX (easy pre‑filtering), while DiskANN promises extreme scale with tradeoffs in accuracy and tooling maturity. The author plans broader benchmarks (insertion performance, realistic SQL filters, and scales to 100M–1B vectors) to better map tradeoffs for large deployments.

Loading comments...

loading comments...