Show HN: Pg_textsearch – BM25 Ranking for Postgres (docs.tigerdata.com)

0 points 7 days ago ago | visit original

🤖 AI Summary

Tiger Data today released pg_textsearch, a Postgres extension for BM25-based full-text ranking that plugs directly into SQL and Tiger Cloud-managed Postgres instances. It replaces Postgres’s ts_rank-style heuristics with corpus-aware BM25 scoring (inverse document frequency weighting, term-frequency saturation, and length normalization) delivered by a memtable-based index and exposed via a distance operator and score thresholds. Scores are returned as negative values (more negative = better match). Installation is done through the Tiger Cloud console (enable the extension on new services or update existing instances; a restart may be required), and the project includes docs on creating single-column BM25 indexes, query patterns, EXPLAIN verification, and operational best practices. For the AI/ML community this matters because pg_textsearch provides noticeably better, rank-sensitive keyword retrieval inside Postgres and is explicitly designed to pair with vector search (pgvector/pgvectorscale) for hybrid semantic+keyword systems. Key technical caveats: the preview is memory-only (index_memory_limit default 64MB), supports single-column indexes, and lacks phrase queries and disk-backed segments—constraints that limit corpus size and some query types until future releases. Practical implications: plan memory based on vocabulary and document counts, select language configs carefully, use score thresholds and EXPLAIN to optimize queries, and combine BM25 with vector search for richer retrieval in embeddings-based pipelines.

Loading comments...

loading comments...