Build Efficient MCP Servers: Three Design Principles (www.damiangalarza.com)

🤖 AI Summary
An engineer building a Model Context Protocol (MCP) wrapper for a YNAB budget discovered that naïvely exposing full API responses to LLMs quickly exhausts context and degrades accuracy. MCPs let models call tools, read resources, and receive prompts, but every JSON key and value eats tokens: Claude Sonnet 4.5’s ~200k-token window sounds large until a year of transactions can be 746,800 tokens or tool definitions alone consume ~47.9k tokens (24% of context). The author measured concrete savings: returning only six essential account fields (id, name, type, on_budget, closed, balance) reduced a 47-account response from ~9,960 tokens to ~3,451 (≈65% reduction); a full budget overview dropped from ~30,405 to ~18,879 tokens (≈38% reduction), freeing room for multi-turn reasoning and memory. From that work came three practical design principles for efficient MCPs: (1) expose only the fields the model actually needs—filter and aggregate server-side rather than proxying entire API payloads; (2) move domain-specific filtering and normalization into tool code (e.g., exclude hidden categories) so the model isn’t burdened with bookkeeping; and (3) minimize verbose tool descriptions and unnecessary data to preserve context for dialogue and chains of tool calls. Applied, these reduce token cost, improve model accuracy by removing noisy internal fields, and make tool-assisted workflows reliable and scalable.
Loading comments...
loading comments...