Show HN: Fast Semantic Tool-filtering for MCP servers (github.com)

0 points 4 hours ago ago | visit original

🤖 AI Summary

Portkey AI released @portkey-ai/mcp-tool-filter, a tiny library that uses embedding similarity to semantic‑filter tool lists for MCP (Model Context Protocol) servers, trimming 1,000+ tools down to the most relevant 10–20 in under 10ms. It’s designed to sit before LLM calls and return ranked, scored tools from either a raw string or chat context, reducing token bloat, speeding agents, and preserving privacy when using local embeddings. You can run fully offline with local Xenova models (default Xenova/all-MiniLM-L6-v2, quantized) or opt for higher-accuracy API embeddings (OpenAI), and it exposes tuning options (topK, minScore, contextMessages, alwaysInclude/exclude, includeServerDescription). The library emphasizes micro-optimizations for real-time use: loop‑unrolled dot products (6–8× speedup), in‑place normalization, a hybrid top‑K selector (switches to heap for >500 tools), true LRU cache, and O(1) set-based exclusions. Typical end‑to‑end numbers: local mode filters 1–5ms (cached <1ms) for ~1,000 tools and initializes in ~0.1–4s depending on size; API embeddings boost accuracy (~5–15%) but cost latency (400–800ms) and money/privacy. Built‑in timing, caching, and model options (384–768 dims or custom API dims) let teams trade speed, accuracy, and cost, making this useful for practitioners building responsive, scalable tool-enabled agents.

Loading comments...

loading comments...