So you wanna build a local RAG? (blog.yakkomajuri.com)

🤖 AI Summary
Skald announced a fully self-hostable, privacy-first RAG stack that can run without sending data to third parties, showing that practical local retrieval pipelines are achievable today. They built a default open-source setup using Postgres + pgvector for the vector DB, SentenceTransformers (all-MiniLM-L6-v2) for embeddings, a Sentence-Transformers cross‑encoder reranker, Docling for document parsing, and leave LLM choice to the user (tested with GPT-OSS 20B via llama.cpp). Deploying this stack took 8 minutes and uses conservative experiment settings (vector topK=100, rerank topK=50, distance threshold=0.8). Benchmarks comparing cloud vs local found: Voyage+Claude (cloud) scored 9.45/10, Voyage embeddings + GPT-OSS 20B scored 9.18, a fully local English-centric setup (MiniLM embeddings + MiniLM reranker) scored 7.10, while a stronger multilingual combo (bge-m3 embeddings + mmarco multilang reranker) reached 8.63. The results show local RAGs already serve many privacy-sensitive use cases—point queries in English are fast and reliable—while limitations remain around multilingual handling and aggregating context across many documents. Skald’s tests highlight clear trade-offs: cheaper/faster default models vs. higher-accuracy multilingual or aggregation-capable models, and the need for additional techniques (better rerankers, chunk-aggregation strategies, or larger LLMs) to close the remaining gap. Skald plans more extensive OSS benchmarks and optimizations for air-gapped or regulated deployments.
Loading comments...
loading comments...