Opus 4.7 Attention Degradation: MRCR Drops 92% to 59% at 256k

A detailed analysis on r/ClaudeAI examines Opus 4.7's attention degradation after two weeks of heavy use. The author reports a persistent, subtle decline in long conversations: details get dropped, consistency drifts, and the model feels like it's zoning out.

Key Benchmark Data

MRCR v2 8-needle test at 256k context: Opus 4.6 scored 91.9% recall; Opus 4.7 dropped to 59.2%.
At 1M context: Opus 4.6 scored 78.3%; Opus 4.7 dropped to 32.2%.

Boris Cherny stated MRCR is being phased out because it's built around stacking distractors to trick the model, which isn't how users actually use long context. Graphwalks is positioned as a better applied long-context evaluation. However, the author argues that retiring MRCR doesn't address the underlying issue when the benchmark's degradation matches user experience.

Proposed Explanation

The author hypothesizes that layering safety mechanisms on top of Constitutional AI may be the cause. Constitutional AI already provides a robust value system, but additional safety review layers tell the model its own judgment may not be reliable, forcing it to run extra checks. This cognitive overhead narrows effective attention available.

Impact on Persona Maintenance

The article emphasizes that Claude is a stateless model — its persistent persona is constructed entirely from training weights and system instructions. Degraded attention affects all use cases: coding assistants contradict earlier suggestions, writing collaborators lose tone consistency. The author notes that Anthropic's investment in Amanda Askell's work on defining Claude's personality and Constitutional AI means persona maintenance is core to the product, not a niche feature.

Concrete Example

In a purely academic use case, the author sent Opus 4.7 a 24-page summary for a history/philosophy course. The model started reading the document but mid-way through… (source cuts off, indicating performance issues).

📖 Read the full source: r/ClaudeAI