CC v2.1.100 inflates cache_creation by ~20K tokens vs. v2.1.98 (github.com)

0 points 2 hours ago ago | visit original

🤖 AI Summary

The recent updates to Claude Code, specifically versions 2.1.100 and 2.1.101, have resulted in a significant increase in the consumption of cache_creation_input_tokens—approximately 20,000 more tokens per API request compared to version 2.1.98. This inflation occurs despite sending fewer bytes in the request payload and is likely tied to server-side changes. This means that not only are users facing higher costs but also, these additional tokens are incorporated into the model's context window, potentially compromising output quality. The implications for the AI/ML community are concerning. The extra cache_creation_input_tokens dilute user-provided instructions and reduce the effective context available for conversations, which could lead to inconsistent agent behavior and complicate debugging efforts. Discussions within the community have highlighted the urgency of addressing this issue, as users may exhaust their usage limits faster without a clear understanding of the underlying factors causing reduced performance. Until a remedy is found, the recommended approach for users is to revert to version 2.1.98, though this raises alarms about transparency and quality in AI interactions moving forward.

Loading comments...

loading comments...