Muvera embeddings for faster multi-vector retrieval (qdrant.tech)

0 points 3 days ago ago | visit original

🤖 AI Summary

Google Research’s new MUVERA embeddings address a longstanding challenge in multi-vector retrieval: balancing the superior accuracy of multi-vector searches with the slower query speeds compared to traditional single-vector methods. Multi-vector models like ColBERT represent documents and queries with multiple embeddings, enabling more nuanced matching but causing inefficiencies because common search structures, optimized for single vectors, struggle with the asymmetric and non-metric MaxSim scoring used in multi-vector retrieval. MUVERA offers a clever solution by transforming variable-length multi-vector representations into fixed-dimensional single vectors that approximate the original multi-vector space, enabling fast initial retrieval via standard vector search algorithms followed by precise reranking using the full multi-vector embeddings. Technically, MUVERA uses SimHash-based locality-sensitive hashing to cluster token vectors into 2^k_sim regions, aggregates these clusters differently for queries and documents, and employs multiple repetitions combined with random projections for dimensionality reduction. Although MUVERA embeddings are larger than typical single-vector embeddings—potentially tens of thousands of dimensions—they significantly reduce computational overhead. Benchmarking on the BeIR nfcorpus dataset showed MUVERA-only search is about 8 times faster but with a drop in accuracy, while MUVERA coupled with reranking recovers nearly all multi-vector search quality with a 7x speedup. FastEmbed 0.7.2 now supports MUVERA embeddings as a post-processing step compatible with existing multi-vector models, democratizing efficient multi-vector retrieval by easing storage and latency concerns, and making advanced retrieval methods practical for real-world applications.

Loading comments...

loading comments...