CLAUDE.md: Cut Claude Output Tokens by 63%

What CLAUDE.md does

CLAUDE.md is a single file you drop into your project root. When Claude Code reads it, behavior changes immediately without code modifications. It specifically targets output behavior: sycophancy, verbosity, and formatting noise.

The problem it addresses

By default, Claude wastes tokens on behaviors that don't add value:

Opens responses with "Sure!", "Great question!", "Absolutely!"
Ends with "I hope this helps! Let me know if you need anything!"
Uses em dashes (--), smart quotes, Unicode characters that break parsers
Restates your question before answering
Adds unsolicited suggestions beyond what you asked
Over-engineers code with unnecessary abstractions
Agrees with incorrect statements ("You're absolutely right!")

Benchmark results

Same 5 prompts tested without CLAUDE.md (baseline) and with CLAUDE.md (optimized):

Explain async/await: 180 words → 65 words (64% reduction)
Code review: 120 words → 30 words (75% reduction)
What is a REST API: 110 words → 55 words (50% reduction)
Hallucination correction: 55 words → 20 words (64% reduction)
Total: 465 words → 170 words (63% reduction)

Approximately 384 output tokens saved per 4 prompts. Note: This is a directional indicator from 5 prompts, not a statistically controlled study.

When it helps vs. when it doesn't

Works best for:

Automation pipelines with high output volume (resume bots, agent loops, code generation)
Repeated structured tasks where Claude's default verbosity compounds across hundreds of calls
Teams who need consistent, parseable output format across sessions

Not worth it for:

Single short queries (file loads into context on every message, causing net token increase on low-output exchanges)
Casual one-off use (overhead doesn't pay off at low volume)
Fixing deep failure modes like hallucinated implementations or architectural drift
Pipelines using multiple fresh sessions per task
Parser reliability at scale (use structured outputs like JSON mode instead)
Exploratory or architectural work where debate and alternatives are the point

Cost considerations

The CLAUDE.md file itself consumes input tokens on every message. Savings come from reduced output tokens. Net benefit is only positive when output volume is high enough to offset the persistent input cost. At low usage, it costs more than it saves.

Model support

Benchmarks were run on Claude only. The rules are model-agnostic and should work on any model that reads context, but results on local models like llama.cpp, Mistral, or others are untested.

📖 Read the full source: HN AI Agents