Llms.py – Lightweight Open AI Chat/Image/Audio Client and Server (github.com)

🤖 AI Summary
llms.py is a single-file, lightweight Python CLI and OpenAI-compatible server for querying many LLM providers from one unified interface. It exposes an OpenAI-style HTTP endpoint (POST /v1/chat/completions) for local deployment, and a CLI with options for model selection, system prompts, raw JSON requests, and serving on a specified port. The tool supports image and audio inputs (automatic download/format detection and base64 encoding), auto-discovers local Ollama models, and can mix local models and cloud providers like OpenAI, Google, Anthropic, OpenRouter, Grok, Groq, Qwen, and Mistral — covering 160+ models. It’s implemented as a single llms.py file with one dependency (aiohttp) and is available via direct download or pip (lapi). Technically, llms.py uses a JSON config (~/.llms/llms.json) to map unified model names to provider-specific names, set headers and default chat templates, and enable/disable providers. Requests are routed to the first enabled provider that supports the requested model (you control ordering to prefer free/cheapest/local providers), and failures are automatically retried on the next provider for resilience and cost control. This makes it easy to build hybrid local/cloud pipelines, fall back between providers transparently, reuse existing OpenAI-compatible clients, and experiment with many models without rewriting code or integrations.
Loading comments...
loading comments...