Show HN: Synthetic data generation for evaluating RAGs (docs.kiln.tech)

0 points 245 days ago ago | visit original

🤖 AI Summary

A new workflow in Kiln’s Eval UI makes it easy to generate synthetic Q&A datasets from your document library to measure “reference answer accuracy” for retrieval-augmented generation (RAG) systems. The tool extracts text from documents (PDF/HTML → markdown/plain text), generates realistic user queries and concise reference answers using a chosen model and guidance, and tags/saves pairs for evaluation. You can review and prune pairs, reuse prior extractions, and avoid maintaining a separate golden set because the generated answers serve as ground truth. This systematic approach matters because it scales RAG evaluation and helps pinpoint which retrieval and model choices actually improve end-to-end accuracy. Kiln supports experimenting across extractor configs, chunking strategies (fixed-window or semantic with size/overlap), embedding models, index types (full-text, vector, hybrid) and rerankers, plus task-model and prompt variations. A configurable “judge” (LLM-based or G-Eval) runs comparisons and reports average scores across run configurations so you can compare K/N settings and model combinations. For teams building RAG pipelines, this reduces manual test creation, speeds iteration, and yields actionable insights into how retrieval, chunking, and model choices interact.

Loading comments...

loading comments...