monk: Reduce Claude Output Tokens 54% by Silencing Narration

A Reddit user created monk, a skill that makes AI agents work silently — stripping narration, preambles, postambles, and progress commentary from responses, keeping only the results. The effect is an estimated 54% reduction in output tokens per turn (47% coding, 65% chat, 54% research), and compounding context savings that grow with session length.

How it works

monk suppresses all "I'm now doing X..." narration, task-list widgets, and status pings. The agent only outputs standard results at the end of each step. The skill is available on GitHub: github.com/marpxxx/skillz/tree/main/monk.

Benchmark results

Tests used 30 tasks (10 per category: coding, chat, research) with verbosity approximated via OpenAI's cl100k_base tokenizer. Key numbers:

Single-turn output savings: Coding 47%, Chat 65%, Research 54%, Overall 54%.
Context capacity gain (compounding): At ~20 rounds (typical session), +13% (coding), +14% (chat), +20% (research). At 100 rounds, +29% (coding), +36% (chat), +39% (research).
API cost (Claude Sonnet 4.6, prompt caching): ~19% cost saving on a 10-round session.

The test did not count tokens suppressed in tool-use widgets or status pings, so real-world savings may be higher.

Caveats

The verbose samples are AI-generated approximations. A well-tuned base agent may already be terser; a verbose one with narration-heavy skills may produce more. Tokenizer is OpenAI's cl100k_base, not Anthropic's. The 8k system-prompt assumption is conservative (many setups have 15-30k). Results are directional estimates, not production benchmarks.

For developers who rarely read real-time agent output, this skill can reduce noise and stretch the context window significantly.

📖 Read the full source: r/ClaudeAI