Teaching Models to Decide When to Retrieve: Adaptive RAG, Part 4 (blog.reachsumit.com)

🤖 AI Summary
This final installment reframes adaptive retrieval as a learned skill: rather than heuristically deciding when to call a retriever, researchers train lightweight models (or fine-tune LLMs) to make that decision. The approaches fall into three families—separate “gatekeeper” classifiers that route queries; fine-tuning LLMs to self-signal gaps; and training LLMs to reason iteratively about what they know and need—trading off training complexity, inference cost, and decision sophistication. The payoff is reduced unnecessary RAG calls (lower latency/cost), fewer hallucinations, and plug-and-play solutions that work with closed-source backbones. Key technical patterns across papers: gatekeepers often use the final-layer hidden-state or last-token embedding as classifier input (Jeong et al.’s T5-Large router classifies queries into A/B/C complexity; UAR trains four binary classifiers for intent/knowledge/time/self-awareness and unifies them in a decision tree). Zeng et al. apply post-retrieval classifiers for internal-knowledge, helpfulness, and contradiction to filter docs before generation. RAGate-MHA trains a multi-head-attention encoder on dialogue context (context-only variant performed best). KBM builds per-model “knowledge boundaries” by sampling 30 outputs to compute mastery (accuracy) and certainty (entropy) as soft labels for a downstream decision model. DioR adds proactive/reactive detection using attribution entropy and an RNN to flag likely hallucinations. Overall, these modular classifiers provide efficient, model-specific retrieval control and can be combined to balance cost, accuracy, and safety.
Loading comments...
loading comments...