Show HN: Self-hosted RAG for docs and code (FastAPI, Docling, ChromaDB) (github.com)

0 points 227 days ago ago | visit original

🤖 AI Summary

LocalRAG is a Docker-first, self-hosted RAG toolkit that separates code and prose during ingestion to deliver more accurate, privacy-preserving retrievals. It ships as a zero‑config stack (docker compose up) with a FastAPI backend (LlamaIndex orchestration), ChromaDB for persistent vectors, Ollama for local embeddings/LLMs, and Docling for advanced parsing. The key innovation is context-aware ingestion: AST-based chunking for code (function/method boundaries) and semantic chunking for documents, stored in separate collections so API docs don’t “pollute” code queries. You can run it locally, pull embeddings (e.g., ollama pull nomic-embed-text), ingest repos or PDFs via UI or REST API, and query collections selectively or across multiple collections. For practitioners, this matters because treating code like prose reduces retrieval precision—AST chunking and per-collection profiles yield more relevant, snippet-ready answers for tasks like onboarding, debugging, and code search. The stack is production-friendly (ChromaDB 0.5.23, LlamaIndex 0.12.9, Docling 2.13.0), supports batch ingestion, .ragignore, hot-reload for development, and is extensible to other LLM providers (OpenAI/Anthropic/Cohere), rerankers, and multimodal features. Open-source (MIT) and REST-first, LocalRAG is a pragmatic option for teams wanting local, auditable RAG with code-aware retrieval out of the box.

Loading comments...

loading comments...