A 429 from a quota cap and a 429 from rate-limit need different cooldowns (github.com)

🤖 AI Summary
A new Python library called Resilient LLM Router has been released, aimed at optimizing the routing of calls to multiple large language model (LLM) providers such as OpenAI and Anthropic. This library uniquely tracks three states per provider—rate-limit, quota, and circuit health—allowing it to discern when a call should proceed or be skipped based on nuanced conditions surrounding provider responses, particularly HTTP 429 errors. Unlike traditional routers that apply a one-size-fits-all cooldown period, this library separates cooldowns for transient rate limits from those caused by quota exhaustion, potentially improving API utilization and response efficiency. The significance of this development lies in its ability to enhance performance and reliability when interfacing with various LLM providers. By maintaining distinct cooldown periods for different failure modes, it minimizes the risk of unnecessary retries on exhausted quotas, thereby optimizing resource allocation. The library supports multiple backends, including in-memory, SQLite, and Postgres, and features an optional probe daemon that performs regular checks on provider status. This approach offers developers greater flexibility and control in managing provider interactions, ultimately contributing to more robust AI applications.
Loading comments...
loading comments...