Hybrid Search vs. Agentic Search Benchmarks (weaviate.io)

🤖 AI Summary
Query Agent’s general availability introduces a new Search Mode — a compound, “agentic” retrieval pipeline built to sit between simple vector/database search and full RAG systems. The team benchmarked Search Mode against a canonical Hybrid Search stack (Weaviate hybrid: vector search with Snowflake Arctic 2.0 embeddings + BM25, fused via Reciprocal Rank Fusion) across 12 IR datasets from BEIR, LoTTe, BRIGHT, EnronQA and WixQA. Across the board Search Mode delivered consistent gains (mean improvements such as +17% Success@1 and +11% Recall@5), with minimum per-benchmark gains of +5% and peaks up to +24% (BRIGHT Biology). Example wins include Natural Questions Success@1 rising from 0.43→0.52 and SciFact 0.58→0.69; BRIGHT subsets—designed for long, reasoning-heavy queries—show especially large boosts in nDCG and Recall. Technically, Search Mode combines IR best practices — query expansion and decomposition, schema introspection, and learned reranking — to prioritize recall and surface the most relevant document first; experiments were run three times to account for model stochasticity and results are reproducible via their GitHub repo. The practical implication for the AI/ML community is clear: for recall- and reasoning-intensive search (long queries, domain-specific corpora, private email or documentation), agentic compound retrieval can substantially outperform hybrid vector+BM25 solutions, albeit with higher inference latency — so choose Search Mode for quality-critical applications and hybrid search for latency-sensitive deployments.
Loading comments...
loading comments...