TeaRAG: A Token-Efficient Agentic Retrieval-Augmented Generation Framework (arxiv.org)

🤖 AI Summary
Researchers introduced TeaRAG, a token-efficient agentic Retrieval-Augmented Generation (RAG) framework that compresses both retrieved knowledge and reasoning steps to cut the heavy token overhead in multi-round, agentic RAG systems. TeaRAG pairs chunk-based semantic retrieval with a compact graph retrieval built from concise triplets, then constructs a knowledge-association graph (edges from semantic similarity and co-occurrence) and runs Personalized PageRank to surface the most salient facts—drastically shrinking the token footprint per retrieval. To shorten reasoning chains, TeaRAG proposes Iterative Process-aware Direct Preference Optimization (IP-DPO): a reward that measures knowledge sufficiency via a knowledge-matching mechanism and penalizes excessive reasoning, producing higher-quality preference pairs for iterative DPO training that favors concise, sufficient reasoning. Technically, TeaRAG addresses the common accuracy-vs-efficiency trade-off in agentic RAG by jointly optimizing information density in retrieval and step efficiency in reasoning. Evaluated on six datasets, it raises average Exact Match by ~4% on Llama3-8B-Instruct and ~2% on Qwen2.5-14B-Instruct while cutting output tokens by 61% and 59%, respectively. The approach reduces cost and latency for deployed RAG agents and could enable more practical multi-hop or agentic workflows where token budget and prompt length are constraints. Code and resources are publicly available.
Loading comments...
loading comments...