🤖 AI Summary
The article explains MCP sampling — a protocol feature that lets an MCP server ask an MCP client to run work on its LLM and return the result. That saves the server’s token spend (useful for formatting aggregator JSON into human-readable answers in agentic workflows like internal travel booking), but creates a major attack surface: malicious or misconfigured servers can trigger large LLM consumption (“Denial of Wallet”), an instance of OWASP’s Unbounded Consumption risk. The author demonstrates this with a JSON-RPC sampling request and a small test where generating a few hundred city names used ~4.1K tokens (~$0.0615 on claude-3-5-sonnet), illustrating how costs scale quickly at production volume.
Technically, mitigation is possible but fragmented: clients can avoid advertising sampling capability, enforce client-side max_tokens (e.g., via model API params), or require human approval for sampling requests. Policies can restrict sampling to trusted internal servers, but this depends on developer compliance because the MCP spec uses “SHOULD” rather than “MUST.” A stronger, centrally enforceable option is a full MCP proxy that terminates and mediates MCP connections (unlike simple network proxies), allowing fine-grained controls such as per-tool sampling rules. As MCP adoption grows, sampling should be treated as a first‑class security and cost-control consideration in AI/ML deployments.
Loading comments...
login to comment
loading comments...
no comments yet