Analysis of 100M tokens in Claude Code reveals 99.4% input usage

Token usage breakdown from 100M tokens tracked
A detailed analysis of Claude Code usage tracked 1,289 requests across extended coding sessions, totaling approximately 100.9M tokens. The breakdown reveals a significant imbalance between input and output tokens.
Token distribution:
- Input tokens: 100.3M (99.4% of total)
- Cached tokens: 84.2M (84% of input)
- Output tokens: 616K (0.6% of total)
The context re-reading bottleneck
Claude Code spends 99.4% of its token budget reading context and only 0.6% writing code. This pattern isn't specific to Claude Code but reflects how all agentic coding systems currently operate. Every time Claude Code makes a move — reading a file, running a command, editing code — it requires the full context to be fed back in, including repository structure, conversation history, tool results, and error logs.
The 84M cached tokens represent the same context being re-sent 1,289 times because the model lacks persistent memory between turns. Unlike human developers who maintain a mental model of their codebase, Claude Code follows a pattern of: forget everything → re-read everything → write code → forget everything again.
Prompt caching limitations
Anthropic's prompt caching makes this process cheaper but doesn't make it faster. The bottleneck isn't inference speed — it's the re-reading loop. The analysis suggests the real unlock for Claude Code and agentic coding in general would be persistent project memory — not just saved facts via memory files or CLAUDE.md, but a compressed, evolving understanding of the codebase that carries forward across sessions.
Current systems essentially brute-force intelligence through repeated context instead of building understanding. The day this changes could make AI coding genuinely faster by eliminating the need to repeatedly process the same information.
📖 Read the full source: r/ClaudeAI
👀 See Also
FairyFuse Achieves 29.6x Kernel Speedup on CPUs via Ternary Weight Multiplication-Free Inference
FairyFuse fuses eight real-valued sub-GEMVs into a single AVX-512 loop using masked adds/subtracts, yielding 32.4 tokens/s on Xeon 8558P and 1.24x speedup over llama.cpp Q4_K_M with near-lossless quality.

Anthropic DNS Activity Reveals New STT Service, API RC2, and Tunnel Infrastructure
DNS monitoring of Anthropic's subdomains shows new records for a speech-to-text service on a 'Titanium' platform, an API release candidate 2, tunnel infrastructure, and an MCP proxy in staging.

OpenClaw Codex OAuth returning billing errors despite valid account
OpenClaw Codex OAuth is returning a 429 error stating 'Your account is not active, please check your billing details' even though billing is confirmed valid and the exec command works. The issue persists across multiple OpenClaw versions.

Two AI Failures in One Demo: Claude Code Fixes Spelling Instead of Schema Error, OpenAI Mangles Custom Field Mapping
During a live workshop, Claude Code ignored a JSON schema validation error to fix spelling warnings, and OpenAI returned garbage on first attempt at mapping weird custom Salesforce fields.