Show HN: Pyversity – Fast Result Diversification for Retrieval and RAG (github.com)

0 points 14 hours ago ago | visit original

🤖 AI Summary

Pyversity is a lightweight, NumPy-only library for fast result diversification in retrieval and RAG workflows. It provides a unified API (pip install pyversity) to re-rank retrieval outputs and reduce near-duplicate results by implementing popular strategies—MMR, MSD, DPP and COVER—returning a DiversificationResult that includes selected indices and selection scores. A single diversity parameter (0.0–1.0) controls the relevance-vs-diversity tradeoff, and the library’s implementations are highly optimized: common cases run in milliseconds on typical batch sizes. For practitioners this matters because naïve relevance-only ranking often yields redundant top hits that hurt user experience or prompt quality in LLM pipelines. Pyversity offers practical choices: MMR as a fast, general-purpose default (O(k·n·d)); MSD for stronger spread; DPP for probabilistic “repulsion” (more compute: O(k·n·d + n·k²)); and COVER/facility-location when you need coverage at the cost of higher complexity (O(k·n²)). The package maps directly to real-world needs—e-commerce, news, academic search and RAG—enabling better exploration, coverage, and fewer duplicate passages fed into downstream models, all with minimal dependency overhead and research-backed algorithms.

Loading comments...

loading comments...