Claude Code Cache Bugs Can Increase API Costs 10-20x

A Reddit post in the ClaudeCode community reports two cache-related bugs in Claude Code that can significantly increase API costs. According to the source, these bugs can cause API costs to silently increase by 10-20 times their expected amount.
Source Details
The information comes from a Reddit post titled "PSA: Claude Code has two cache bugs that can silently 10-20x API costs" posted in the r/ClaudeCode community. The post generated discussion on Hacker News with 27 points and 3 comments at the time of reporting.
Cache bugs in AI coding assistants like Claude Code can be particularly problematic because they affect how the system reuses previously generated content. When cache mechanisms fail, the system may regenerate content unnecessarily, leading to repeated API calls and increased costs without visible changes in functionality.
Technical Context
AI coding assistants typically implement caching to reduce redundant API calls and control costs. Claude Code, like similar tools, likely uses caching to store and reuse code generation results when similar prompts are provided. Cache bugs in such systems can defeat these optimization mechanisms, causing the tool to make full API calls for operations that should be served from cache.
For developers using Claude Code, monitoring API usage and costs is recommended, especially when working with repetitive or similar coding tasks where caching should provide the most benefit.
📖 Read the full source: HN AI Agents
👀 See Also

NVIDIA Releases Nemotron-3-Ultra-550B: 55B Active Parameters, 1M Context, LatentMoE Hybrid
NVIDIA released Nemotron-3-Ultra-550B-A55B-BF16, a 550B parameter model with 55B active, 1M token context, hybrid LatentMoE architecture (Mamba-2 + MoE + Attention + MTP), and configurable reasoning.

Kimi K2.6 vs Claude Opus 4.7: A Practical Coding Showdown on a Minetest Mod + Google Sheets Integration
A developer tested Kimi K2.6 and Claude Opus 4.7 on building a Minetest bounty board mod with a TypeScript backend and Google Sheets logging. Opus succeeded in both tasks; Kimi passed the local task but failed the integration. Costs: Opus ~$3.59 local, $16.03 integrated; Kimi $0.39 local, $5.03 failed.

ETH Zurich Study: Excessive Context Reduces AI Coding Agent Performance
An ETH Zurich study tested four coding agents on 138 real GitHub tasks and found that LLM-generated context files reduced task success rates by 2-3% while increasing inference costs by 20%. Human-written context only improved success by ~4% with significant cost increases.

Qwen3.5-122B-A10B-MINT-MLX runs smoothly on M5 Pro with 64GB RAM
A user reports successful local deployment of the Qwen3.5-122B-A10B-MINT-MLX model on an M5 Pro with 64GB RAM, achieving 39.58 tokens/sec generation speed with specific VRAM allocation commands.