Show HN: PromptCache – a self-hosted semantic cache for LLMs (Go and BadgerDB) (github.com)

🤖 AI Summary
PromptCache is a newly released self-hosted semantic caching proxy for LLMs (implemented in Go with BadgerDB) that sits between your app and an LLM provider to detect repeated intents and serve cached completions. It targets high-volume GenAI workloads—RAG, agent tool calls, and support bots—where many prompts are variations of the same question. The project claims big wins in real workloads (example: cost per 1,000 requests from ~$30 to ~$6, median latency from ~1.5s to ~300ms, and removing external API throughput limits), and is drop-in compatible with the OpenAI API by switching the client base_url. Technically it balances safety and performance with a two-stage verification: high-similarity prompts yield direct cache hits, low-similarity prompts skip the cache, and “gray zone” matches are verified with a small, inexpensive intent-check model to avoid incorrect reuse. It’s a pure-Go service optimized for concurrency, uses BadgerDB plus in-memory caching for speed, and supports Redis for distributed caches, clustered replication, custom embedding backends (local/Ollama), and multiple providers (OpenAI, Claude, Mistral). Extras include rate-limiting, a dashboard with hit-rate/latency/cost metrics, and MIT licensing—making it practical for teams wanting predictable cost/latency control without changing application code.
Loading comments...
loading comments...