Optimizing Filtered Vector Queries in PostgreSQL from Seconds to Milliseconds (www.clarvo.ai)

🤖 AI Summary
Clarvo debugged and restructured its pgvector + HNSW queries after noticing query times growing linearly with dataset size — the root cause was that poorly written SQL prevented the HNSW index from being used. By following a set of best practices they cut filtered vector-query latency from up to tens of seconds to single-digit milliseconds, demonstrating that PostgreSQL + pgvector can be a viable, low-cost alternative to dedicated vector DBs when queries are written to let the index do the heavy lifting. Key takeaways for engineers: keep HNSW indexes fully resident in RAM (use pg_prewarm) for true 1–2 ms retrieval of top-500 neighbors from hundreds of thousands of 1,536‑dim vectors; define indexes with vector_ip_ops for normalized vectors so queries can use the negative inner product (<#>) equivalent to cosine; use post-filtering with pgvector’s iterative scan (oversampling loop) rather than trying to pre-filter the graph; ensure ORDER BY (distance expression) is the last clause followed only by LIMIT so iterative scan works; simplify WHERE clauses, favor EXISTS or denormalized columns over costly joins, and test with EXPLAIN (ANALYZE, BUFFERS, VERBOSE, COSTS, TIMING) to confirm index usage. Next steps include vector quantization and partitioning (binary/scalar/product/rotational quantization, halfvec) to reduce memory and CPU with predictable recall trade-offs.
Loading comments...
loading comments...