We built a world‑class reranker for RAG (fin.ai)

0 points 16 hours ago ago | visit original

🤖 AI Summary

Intercom built Fin-cx-reranker, a custom reranker for its Fin AI Agent RAG pipeline that outperforms the commercial Cohere Rerank v3.5 while cutting reranking costs by ~80% and keeping production latency unchanged. The reranker improves answer relevance in customer-support workflows (better resolution rates in a 1.5M-conversation A/B test, p < 0.01) and reduces vendor dependency, showing that a targeted, domain-specific model can beat a top off-the-shelf service for English support content. Technically, Fin-cx-reranker is an encoder-only model based on ModernBERT-large (8,192-token context, rotary/relative positional encodings, GeGLU, efficient attention; pretrained on ~2T tokens). Each query–passage pair is encoded as [CLS] query [SEP] passage [SEP], mean-pooled, and fed through a linear head to produce a relevance score. The model was trained on 400k real Fin queries (K=40 candidates → 16M pairs) using RankNet pairwise loss with labels distilled from an LLM-based teacher. Evaluation used a three-stage funnel: FinRank-en-v1 offline benchmark (3k queries) where Fin-cx-reranker raised MAP from 0.521→0.612 (+17.5%), NDCG@10 0.570→0.665 (+16.7%), Recall@10 0.636→0.720 (+13.1%), and Kendall tau 0.326→0.400 (+22.7%); backtesting and live A/B tests also showed precision/recall gains and no latency penalty (P50 ≈150 ms). Next steps include stronger label re-annotations and multilingual expansion.

Loading comments...

loading comments...