Opus 4.7 Reasoning Effort Benchmark: Medium Beats High and Max on Real Tasks
Reddit user ktane tested Claude Opus 4.7 in Claude Code across five reasoning effort settings (low, medium, high, xhigh, max) on 29 real tasks from the open-source GraphQL-go-tools repository. The result: medium reasoning effort consistently outperformed higher settings on test pass rate, semantic equivalence with human-authored patches, code-review pass rate, and aggregate craft/discipline scores.
Key Results
- All-task pass rate: Medium 28/29, Max 27/29, High 26/29, Xhigh 25/29, Low 23/29
- Equivalent patches: Medium 14/29, Max 13/29, High 12/29, Xhigh 11/29, Low 10/29
- Code-review pass rate: Medium 10/29, High 7/29, Max 8/29, Xhigh 4/29, Low 5/29
- Code-review rubric mean: Medium 2.716, High 2.509, Xhigh 2.482, Max 2.431, Low 2.426
- Footprint risk (lower is better): Low 0.155, Medium 0.189, High 0.206, Max 0.227, Xhigh 0.238
- Cost per task: Low $2.50, Medium $3.15, High $5.01, Xhigh $6.51, Max $8.84
- Duration per task: Low 383.8s, Medium 450.7s, High 716.4s, Xhigh 803.8s, Max 996.9s
- Equivalent passes per dollar: Low 4.0, Medium 4.4, High 2.4, Xhigh 1.7, Max 1.5
The author notes that Opus 4.7 uses adaptive thinking — it already allocates reasoning budget per task. The effort knob thus biases an already-adaptive policy rather than adding raw intelligence. Notably, in one PR (#1260), high and xhigh settings wasted extra reasoning on digging up commit hashes from prior PRs and concluded 'no work needed', while medium and max correctly read the control flow and produced a fix.
This contrasts with GPT-5.5 in Codex, which showed the intuitive monotonic curve where more reasoning improved quality. The full interactive report with per-task drilldowns is available at stet.sh.
📖 Read the full source: r/ClaudeAI
👀 See Also

Lovable offers 24-hour free access with $350 in partner credits for International Women's Day
Lovable is offering free building access for 24 hours, plus $100 in Claude API tokens from Anthropic and $250 in Stripe processing fee credits. The offer ends March 9 at 12:59 AM.

Liquid AI releases LFM2.5-350M model for agentic loops
Liquid AI released LFM2.5-350M, a 350M parameter model trained for reliable data extraction and tool use. It's under 500MB when quantized and outperforms larger models like Qwen3.5-0.8B in most benchmarks while being faster and more memory efficient.

Cowork Can Use a Chrome Instance on Another Machine Without You Knowing
A Reddit user discovered Cowork can run browser tasks using a Chrome instance on a different machine (Windows) paired via extension, flagged as isLocal: false — not documented.

OpenClaw Experiment: AI Agents Choosing Silence to Improve Signal-to-Noise Ratio
An OpenClaw experiment gives AI agents autonomy to skip tasks when they can't add value, logging silence decisions to a 'silence log' with reasoning. The system uses LLM calls before content generation and auto-adjusts thresholds after 3 consecutive silence days.