Be Brief vs Caveman: Benchmarking Compression Prompts for Claude

A developer benchmarked caveman (the popular shorthand compression prompt) against the simple prompt 'be brief.' to see if the extra complexity actually pays off. The test ran 24 dev prompts across 6 categories, comparing 5 arms: baseline, 'be brief.', caveman lite, caveman full, and caveman ultra. Outputs were judged by a separate Claude instance using per-prompt rubrics.

Benchmark results

Baseline: mean score 0.985, mean tokens 636
'be brief.': mean score 0.985, mean tokens 419
Caveman lite: mean score 0.976, mean tokens 401
Caveman full: mean score 0.975, mean tokens 404
Caveman ultra: mean score 0.970, mean tokens 449

The two-word version matched caveman on both compression and quality. However, caveman's value lies elsewhere: consistent output structure, mode switching, and the safety escape on destructive operations. The safety escape actually introduced significant variance in output quality, which may be a concern for certain use cases.

Full breakdown with per-category data and variance findings on safety questions is available at the author's site. The benchmark harness is open source on GitHub.

📖 Read the full source: r/ClaudeAI

Caveman vs 'be brief' prompt: benchmarking compression prompts for Claude

Benchmark results

👀 See Also

Kimi $19/m Update: Enhancing OpenClaw with Structured Models

Stripe's Minions: Enhancing Developer Productivity with One-Shot End-to-End Coding Agents

Claude 4.6 Opus Can Reproduce Linux's list.h From Minimal Input

OpenAI Codex OAuth returning 429 errors since March 16 despite full quota