99.4% Input Tokens: Claude Code Analysis of 100M

Token usage breakdown from 100M tokens tracked

A detailed analysis of Claude Code usage tracked 1,289 requests across extended coding sessions, totaling approximately 100.9M tokens. The breakdown reveals a significant imbalance between input and output tokens.

Token distribution:

Input tokens: 100.3M (99.4% of total)
Cached tokens: 84.2M (84% of input)
Output tokens: 616K (0.6% of total)

The context re-reading bottleneck

Claude Code spends 99.4% of its token budget reading context and only 0.6% writing code. This pattern isn't specific to Claude Code but reflects how all agentic coding systems currently operate. Every time Claude Code makes a move — reading a file, running a command, editing code — it requires the full context to be fed back in, including repository structure, conversation history, tool results, and error logs.

The 84M cached tokens represent the same context being re-sent 1,289 times because the model lacks persistent memory between turns. Unlike human developers who maintain a mental model of their codebase, Claude Code follows a pattern of: forget everything → re-read everything → write code → forget everything again.

Prompt caching limitations

Anthropic's prompt caching makes this process cheaper but doesn't make it faster. The bottleneck isn't inference speed — it's the re-reading loop. The analysis suggests the real unlock for Claude Code and agentic coding in general would be persistent project memory — not just saved facts via memory files or CLAUDE.md, but a compressed, evolving understanding of the codebase that carries forward across sessions.

Current systems essentially brute-force intelligence through repeated context instead of building understanding. The day this changes could make AI coding genuinely faster by eliminating the need to repeatedly process the same information.

📖 Read the full source: r/ClaudeAI

Analysis of 100M tokens in Claude Code reveals 99.4% input usage

Token usage breakdown from 100M tokens tracked

The context re-reading bottleneck

Prompt caching limitations

👀 See Also

FairyFuse Achieves 29.6x Kernel Speedup on CPUs via Ternary Weight Multiplication-Free Inference

Anthropic DNS Activity Reveals New STT Service, API RC2, and Tunnel Infrastructure

OpenClaw Codex OAuth returning billing errors despite valid account

Two AI Failures in One Demo: Claude Code Fixes Spelling Instead of Schema Error, OpenAI Mangles Custom Field Mapping