CLAUDE.md: Drop-in file reduces Claude output tokens by 63%

What CLAUDE.md does
CLAUDE.md is a single file you drop into your project root. When Claude Code reads it, behavior changes immediately without code modifications. It specifically targets output behavior: sycophancy, verbosity, and formatting noise.
The problem it addresses
By default, Claude wastes tokens on behaviors that don't add value:
- Opens responses with "Sure!", "Great question!", "Absolutely!"
- Ends with "I hope this helps! Let me know if you need anything!"
- Uses em dashes (--), smart quotes, Unicode characters that break parsers
- Restates your question before answering
- Adds unsolicited suggestions beyond what you asked
- Over-engineers code with unnecessary abstractions
- Agrees with incorrect statements ("You're absolutely right!")
Benchmark results
Same 5 prompts tested without CLAUDE.md (baseline) and with CLAUDE.md (optimized):
- Explain async/await: 180 words → 65 words (64% reduction)
- Code review: 120 words → 30 words (75% reduction)
- What is a REST API: 110 words → 55 words (50% reduction)
- Hallucination correction: 55 words → 20 words (64% reduction)
- Total: 465 words → 170 words (63% reduction)
Approximately 384 output tokens saved per 4 prompts. Note: This is a directional indicator from 5 prompts, not a statistically controlled study.
When it helps vs. when it doesn't
Works best for:
- Automation pipelines with high output volume (resume bots, agent loops, code generation)
- Repeated structured tasks where Claude's default verbosity compounds across hundreds of calls
- Teams who need consistent, parseable output format across sessions
Not worth it for:
- Single short queries (file loads into context on every message, causing net token increase on low-output exchanges)
- Casual one-off use (overhead doesn't pay off at low volume)
- Fixing deep failure modes like hallucinated implementations or architectural drift
- Pipelines using multiple fresh sessions per task
- Parser reliability at scale (use structured outputs like JSON mode instead)
- Exploratory or architectural work where debate and alternatives are the point
Cost considerations
The CLAUDE.md file itself consumes input tokens on every message. Savings come from reduced output tokens. Net benefit is only positive when output volume is high enough to offset the persistent input cost. At low usage, it costs more than it saves.
Model support
Benchmarks were run on Claude only. The rules are model-agnostic and should work on any model that reads context, but results on local models like llama.cpp, Mistral, or others are untested.
📖 Read the full source: HN AI Agents
👀 See Also

Open-Foundry: A Framework for Multi-Agent Debates with Claude Code
Open-foundry is a Python framework that assembles multiple Claude Code agents into a panel to debate complex questions, producing fully inspectable reasoning trails with transcripts, orchestrator logs, and per-agent working notes.

OpenClaw Smart Router Open-Sourced for Automatic Model Selection
A developer has open-sourced a Smart Router for OpenClaw that automatically classifies queries by complexity and routes them to optimal models, saving 60-80% on API costs compared to always using premium models like Claude or GPT-4o.

Linki v2: Open-Source AI SDR for LinkedIn + Cold Email with Self-Hosted Agent
Linki v2 is a self-hosted LinkedIn automation and cold email tool with an AI agent that writes personalized messages per lead. No per-seat pricing, your data stays local.

ModelFitAI: Deploy AI Agents Without VPS Setup, Built with Claude Code
ModelFitAI is a platform that lets developers deploy AI agents directly on its infrastructure, eliminating VPS setup, Docker configuration, and SSH sessions. The entire platform was built using Claude Code by a solo founder.