Ever tried running a full RAG system with zero cloud dependency? (github.com)

0 points 6 hours ago ago | visit original

🤖 AI Summary

A new fully local Retrieval-Augmented Generation (RAG) project lets you index and query PDF collections entirely on your machine—no API keys or cloud services required. The pipeline uses PyMuPDF to extract text, LangChain’s RecursiveCharacterTextSplitter to chunk it, HuggingFace sentence-transformers for embeddings, FAISS for vector search, and Ollama as a local LLM runner. Repo scripts (indexer.py and chatbot.py) automate extract→split→embed→index and query steps; models are downloaded once and cached. Minimum requirements are Python 3.8+, ~8GB RAM (16GB recommended), ~5GB free disk, and internet only for initial setup. Configuration is handled via a .env (OLLAMA_MODEL, EMBEDDING_MODEL, TOP_K), and you can accelerate embeddings/LLM inference with a CUDA-capable GPU by toggling model_kwargs={'device':'cuda'}. This is significant for AI/ML practitioners who need privacy, offline operation, reproducibility, or low-cost RAG demos—especially for sensitive documents or teaching workshops. Technical trade-offs are clear: smaller embedding/LLM models (all-MiniLM-L6-v2, phi3) are fast and lightweight, while larger options (intfloat/e5-large-v2, qwen2.5) improve quality at the cost of RAM/disk and download time. FAISS index persistence avoids re-indexing unless content or embedding model changes. The repo also includes practical troubleshooting (faiss-cpu on Windows, Ollama service checks, OOM mitigations), making it a practical, locally-hosted RAG reference.

Loading comments...

loading comments...