Show HN: LLM-Use – An LLM router that chooses the right model for each prompt (github.com)

0 points 4 hours ago ago | visit original

🤖 AI Summary

LLM‑Use is an open‑source, production‑grade "LLM router" that automatically picks the best model for each prompt based on task complexity, cost limits and real measured quality. It bundles real‑time streaming (SSE + async generators), A/B testing with statistical significance (t‑tests, effect sizes, confidence intervals), continuous quality scoring (semantic similarity, grammar, coherence), circuit breakers and automatic fallback chains. The project is aimed at teams that need to balance latency, cost and accuracy across multi‑provider fleets (OpenAI, Anthropic, Groq, Google, Ollama or custom providers) while maintaining observability and SLAs. Technically, LLM‑Use computes a complexity score from linguistic features to route requests (speed vs quality tradeoffs), uses embedding‑based similarity and NLP tools (spaCy, SentenceTransformers, LanguageTool) for multi‑dimensional scoring, and persists experiments to SQLite for durable analysis. It supports stream caching, LRU+TTL response caching, Prometheus metrics, FastAPI endpoints, and a benchmarking suite that measures latency, tokens/sec, and quality across math/reasoning/code/creative tasks. Enterprise features include audit logging, compliance checks and cost tracking, and the repo provides Docker/Kubernetes manifests for scaling. For ML/ops teams this gives a practical framework to automate model selection, run statistically rigorous model comparisons, and enforce reliability and cost constraints in production.

Loading comments...

loading comments...