🤖 AI Summary
We benchmarked eight leading rerankers under identical RAG conditions to find which performs best for real-world applications. Each reranker used BGE-small-en-v1.5 embeddings with FAISS (top-50) retrieval and was evaluated across six datasets (finance, business, essays, web, facts, science) for latency, nDCG, and Recall. To capture perceived relevance we ran a novel LLM-judged ELO: GPT-5 compared anonymized top-5 result lists from pairwise reranker matchups and ranked the winners, producing an Elo-style preference score that reflects human-like judgments of result quality.
Results: Zerank-1 scored highest on the LLM ELO (best relevance), while Voyage Rerank 2.5 was a close second and delivered roughly 2× lower latency, making it the most pragmatic choice for production RAG where speed and quality matter. Cohere v3.5 was the fastest but less preferred by the LLM; CTXL-Rerank v2 and BGE v2-M3 showed strong domain-specific peaks (science/facts or select niches) but less consistency. Visual trade-offs and radar plots highlighted which models are stable generalists (Zerank-1, Voyage 2.5) versus specialists. Bottom line: choose Zerank-1 for top relevance, Voyage 2.5 for the best speed/quality balance, and consider specialist rerankers when targeting domain-specific retrieval.
Loading comments...
login to comment
loading comments...
no comments yet