Lessons from building an intelligent LLM router (github.com)

🤖 AI Summary
Adaptive is an open-source “intelligent LLM router” you can drop in as an OpenAI-compatible gateway to automatically pick the best model per request, with reported cost savings of roughly 30–80% versus calling providers directly. It routes requests across providers (OpenAI, Anthropic/Claude, Groq, DeepSeek, Google AI, plus local/enterprise models), supports streaming and non‑streaming responses, preserves Claude message formats, and integrates with common stacks (TypeScript examples, LangChain, Vercel AI SDK, ai/react hooks). In practice you point your client at the Adaptive endpoint and leave the model field empty to enable automatic routing; the project includes examples showing zero-code changes to realize large cost reductions. Technically it’s a multi-component system: a Go API server (Fiber, OpenAI SDK, Redis) for routing and caching, a Python ML service (LitServe, Hugging Face, scikit-learn) for model selection logic, and a Next.js app for UI/analytics. Key features include dual-layer caching for instant responses, analytics for usage and cost insights, cost-vs-performance tuning via a cost_bias parameter, function-calling prioritization, test-mode routing, and custom model specs for enterprise/local inference. This makes it practical for production workflows, batch/agent scenarios and latency-sensitive apps while enabling fine-grained tradeoffs between cost, latency and quality.
Loading comments...
loading comments...