Opus 4.7 Benchmark: Medium Reasoning Beats High & Max

Reddit user ktane tested Claude Opus 4.7 in Claude Code across five reasoning effort settings (low, medium, high, xhigh, max) on 29 real tasks from the open-source GraphQL-go-tools repository. The result: medium reasoning effort consistently outperformed higher settings on test pass rate, semantic equivalence with human-authored patches, code-review pass rate, and aggregate craft/discipline scores.

Key Results

All-task pass rate: Medium 28/29, Max 27/29, High 26/29, Xhigh 25/29, Low 23/29
Equivalent patches: Medium 14/29, Max 13/29, High 12/29, Xhigh 11/29, Low 10/29
Code-review pass rate: Medium 10/29, High 7/29, Max 8/29, Xhigh 4/29, Low 5/29
Code-review rubric mean: Medium 2.716, High 2.509, Xhigh 2.482, Max 2.431, Low 2.426
Footprint risk (lower is better): Low 0.155, Medium 0.189, High 0.206, Max 0.227, Xhigh 0.238
Cost per task: Low $2.50, Medium $3.15, High $5.01, Xhigh $6.51, Max $8.84
Duration per task: Low 383.8s, Medium 450.7s, High 716.4s, Xhigh 803.8s, Max 996.9s
Equivalent passes per dollar: Low 4.0, Medium 4.4, High 2.4, Xhigh 1.7, Max 1.5

The author notes that Opus 4.7 uses adaptive thinking — it already allocates reasoning budget per task. The effort knob thus biases an already-adaptive policy rather than adding raw intelligence. Notably, in one PR (#1260), high and xhigh settings wasted extra reasoning on digging up commit hashes from prior PRs and concluded 'no work needed', while medium and max correctly read the control flow and produced a fix.

This contrasts with GPT-5.5 in Codex, which showed the intuitive monotonic curve where more reasoning improved quality. The full interactive report with per-task drilldowns is available at stet.sh.

📖 Read the full source: r/ClaudeAI

Opus 4.7 Reasoning Effort Benchmark: Medium Beats High and Max on Real Tasks

Key Results

👀 See Also

AI Engineers Aren't Safe From Being Replaced by AI

Claude Code v2.1.37 Released

Anthropic Urges Global Pause in AI Development, Flags Self-Improvement Risk

Transformer Language Model Runs Locally on Stock Game Boy Color