Yore – Deterministic document indexer for large, agent-driven codebases (github.com)

0 points 225 days ago ago | visit original

🤖 AI Summary

Yore is a fast, deterministic documentation indexer and retrieval pipeline built to supply high‑signal, token‑aware context to LLMs and automation agents operating over large, messy codebases. Rather than returning file lists, yore answers “given this question and a fixed token budget, what exact slice of docs should an LLM see?” It tackles documentation sprawl by computing canonicality scores, detecting duplicate documents and sections, following Markdown links and ADR chains, validating link structure, and exposing these signals programmatically so agents (and humans) can make consistent, safe decisions. Technically, yore combines BM25 retrieval with structural metadata, link‑graph analysis, and multi‑metric duplicate detection (Jaccard + MinHash + SimHash) in a reproducible pipeline that performs cross‑reference expansion and sentence/code‑preserving extractive refinement before final token‑aware trimming. Key commands include yore build (indexing), yore query, yore assemble (assemble context with depth, max‑tokens, max‑sections), dupes/dupes‑sections, canonicality, backlinks/orphans, check‑links, and yore eval (JSONL test harness for regression detection). It intentionally avoids sampling or embeddings (deterministic output) and sits atop Lucene‑like primitives to provide LLM‑aware retrieval, governance, and repeatable evaluation for agent-driven documentation workflows.

Loading comments...

loading comments...