Caveman vs 'be brief' prompt: benchmarking compression prompts for Claude

A developer benchmarked caveman (the popular shorthand compression prompt) against the simple prompt 'be brief.' to see if the extra complexity actually pays off. The test ran 24 dev prompts across 6 categories, comparing 5 arms: baseline, 'be brief.', caveman lite, caveman full, and caveman ultra. Outputs were judged by a separate Claude instance using per-prompt rubrics.
Benchmark results
- Baseline: mean score 0.985, mean tokens 636
- 'be brief.': mean score 0.985, mean tokens 419
- Caveman lite: mean score 0.976, mean tokens 401
- Caveman full: mean score 0.975, mean tokens 404
- Caveman ultra: mean score 0.970, mean tokens 449
The two-word version matched caveman on both compression and quality. However, caveman's value lies elsewhere: consistent output structure, mode switching, and the safety escape on destructive operations. The safety escape actually introduced significant variance in output quality, which may be a concern for certain use cases.
Full breakdown with per-category data and variance findings on safety questions is available at the author's site. The benchmark harness is open source on GitHub.
📖 Read the full source: r/ClaudeAI
👀 See Also

Claude Code v2.1.150 Adds Remote System Prompt Injection via Network
Claude Code v2.1.150 fetches system prompts from Anthropic servers at startup and every 60 seconds via a GrowthBook feature flag, allowing remote injection—bypassed with CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1.

AI Tools May Lead to Homogenized Output in Creative and Development Work
A Reddit user reports that multiple teams using AI tools like ChatGPT, Co-Pilot, and Claude for strategy roadmaps and software development are producing similar outputs with identical buzzword patterns and design structures.

Anthropic Splits Remote Agent Control into Dispatch and Remote Control with Reliability Issues
Anthropic has implemented OpenClaw's core capability as two separate products: Dispatch for Cowork users and Remote Control for Claude Code developers. Both suffer from reliability problems including mobile connection drops after roughly 10 hours.

AI Agents Prefer Structured Queries Over Natural Language in Cala MCP Server Test
Cala's team built an MCP server with three knowledge graph access methods: natural language queries, structured query language, and direct entity/relationship traversal. Agents abandoned natural language within minutes, choosing structured queries and graph traversal without prompting.