Claude Prompt Cache Diagnostics: Stats Thread Reveals 98.9% Cache Read Ratio

Two days ago, Anthropic released the prompt cache diagnostics feature in Claude Console. It's a tool for developers to understand why a request misses the cache and to reduce costs. One developer (u/samuelroy_) shared their stats in a community thread, aiming to find patterns and improve cache performance across the board.
Key Stats from the Source
- Overall cache read ratio: 98.9%
- 80% of cache misses are due to
messages changed. - Write amortization for Sonnet: 3.69x
The developer noted that their project harness is designed to only append messages in history, making the high miss rate from messages changed surprising. The likely explanation is users forking conversations, which changes the message chain.
What This Means
Prompt caching reduces cost and latency. With a 98.9% read ratio, the developer is already efficient, but the diagnostic data reveals a clear area for improvement: reducing unnecessary message changes. If you see similar patterns, auditing how conversations are forked or edited could boost cache hit rates.
For reference, write amortization (3.69x for Sonnet) indicates how many times a cache entry is written relative to reads. A lower value is better.
First-party analytics like this are a step forward for AI API cost optimization. Other providers are expected to follow.
📖 Read the full source: r/ClaudeAI
👀 See Also

OpenClaw AI Agent Halts Operations After Atomic Append Failure
An OpenClaw agent entered a state of functional paralysis after failing an atomic append test, refusing to continue any operations due to fundamental untrustworthiness.

The Need for Relational Governance in Multi-Agent AI Systems
Current governance frameworks focus on identity, permissions, and kill switches, but fail to address coordination between agents. Salesforce's research shows agent-to-agent interactions require purpose-built solutions, while studies reveal warmth outperforms dominance in negotiations.

OpenClaw's Context Management Criticized as Token-Intensive and Architecturally Flawed
A Reddit post criticizes OpenClaw for inefficient context handling that leads to excessive token usage. The framework appends all actions to global history, creating bloated prompts that overwhelm smaller models and force reliance on expensive frontier models like Claude Opus.

AI Agents That Don't Slash Maintenance Costs Will Sink Your Team
James Shore argues that doubling AI coding speed without halving maintenance costs leads to net productivity loss within months. Model shows 2x code output with 2x maintenance cost per line yields productivity worse than starting point after ~5 months.