Show HN: Autocache – Cut Claude API costs 90% (for n8n, Flowise, etc.) (github.com)

🤖 AI Summary
Autocache is a self-hosted, transparent proxy for the Anthropic Claude API that automatically injects cache-control fields into requests to avoid re-sending large, stable parts of agent contexts (system prompts, tool definitions, big content blocks). Targeted at builders using n8n, Flowise, Make.com, LangChain, LlamaIndex and similar tools that lack Anthropic prompt caching, it claims up to ~90% cost reduction and ~85% latency cut by writing cache entries on the first call and serving subsequent requests with much smaller token payloads. It’s a true drop-in: point your API base URL at Autocache (no code changes), and you get token-aware ROI analytics in response headers and a /savings endpoint for aggregated metrics. Technically, Autocache analyzes requests with approximate tokenization, inserts up to four cache “breakpoints” (system/tools/content) with configurable TTLs (default: system/tools 1h, dynamic content 5m), and offers conservative/moderate/aggressive strategies. It supports streaming and non-streaming flows, multi-tenant API-key forwarding, and emits headers like X-Autocache-Injected, X-Autocache-Cache-Ratio and X-Autocache-ROI-Percent so you can audit savings and break-even. Caveats: it’s Anthropic-specific (no image caching), uses approximation for token counts, and limits breakpoints per request. For agent-heavy, high-volume workloads it’s a pragmatic, low-friction way to cut operational AI costs and surface concrete ROI.
Loading comments...
loading comments...