Reducing MCP token usage by 100x – you don't need code mode (www.speakeasy.com)

0 points 23 hours ago ago | visit original

🤖 AI Summary

Speakeasy announced Dynamic Toolsets for MCP (available now in Gram) — a hybrid discovery system that slashes token use by orders of magnitude while keeping agents reliable and discoverable. Their benchmarks report up to 160x reductions versus static toolsets and average input-token drops of ~96% (total token reductions ~90%+), with 100% task success across 40–400 tool experiments. The practical payoff: you can expose hundreds of API operations to LLM agents without blowing the context window or resorting to heavyweight “code mode” engineering, making MCP viable for production-scale AI tooling. The technical approach splits tool interaction into three explicit primitives: search_tools (embeddings-based semantic search augmented with brief category overviews and tag filters like source:hubspot), describe_tools (lazy schema loading so large input schemas are only fetched when needed — schemas often account for 60–80% of static tokens), and execute_tool (run the tool). This yields predictable costs and scaling, at the expense of more LLM calls (2–3x) and ~50% higher execution latency in their tests; typical workflows use 6–8 tool calls (search → describe → execute). Conversation history acts as a cache to reduce repeat costs. The result is a practical, production-ready MCP pattern that balances discoverability and token efficiency without forcing teams into bespoke runtime architectures.

Loading comments...

loading comments...