Context Quality Degradation in AI Agents: Hallucination Rates Increase with Token Count

Context Window Performance Testing Results
A developer tested context quality degradation across different token counts in AI agents, revealing significant performance issues as context size increases.
Key Findings from Testing
The testing measured several critical metrics:
- Hallucination rates by context size:
- 10K tokens: ~3%
- 50K tokens: ~11%
- 200K tokens: ~28%
- 1M tokens: unclear, but the trend shows increasing degradation
- Recall accuracy: No tested model (including GPT-4, Claude, or local models) achieved 90% recall on information from the first 10 turns once context exceeded 50K tokens.
- Token efficiency: At 200K tokens, the percentage of context actually relevant to the current query drops below 12% in most agent tasks, meaning approximately 188K tokens add noise that the model must reason around.
Problem Analysis
The issue appears to be attention starvation rather than forgetting. Early context competes with recent context, with recent context usually winning due to higher positional relevance. This causes constraints set early in sessions (like "use PostgreSQL, no ORMs") to become progressively diluted as more context accumulates.
By turn 89 with 200K tokens, the model's attention is so spread across the context that early constraints effectively disappear.
Current Solutions and Limitations
Many developers add vector databases to retrieve "relevant" memories, which helps somewhat. However, this approach retrieves semantically similar content rather than what the agent needs for correct reasoning. For example, "use PostgreSQL" is not semantically similar to "write me a login endpoint" even though it needs to be in context for proper execution.
The developer is seeking feedback on whether these findings match production experiences and what approaches have actually worked for others.
📖 Read the full source: r/LocalLLaMA
👀 See Also

The First Step to AGI: Bridging the Gap with ClawDBot
Explore how ClawDBot advances us towards AGI by enhancing AI coding agents, showcasing a pivotal step in AI evolution.

xAI loses legal challenge to California AI data disclosure law
xAI has lost its attempt to block California's AI data disclosure law, which requires companies to disclose training data sources and other details about their AI systems. The court ruling means the law will proceed as scheduled.

Uber burns 2026 AI budget in 4 months on Claude Code — $500–$2k per engineer monthly
Uber spent its entire 2026 AI budget by April on Claude Code and Cursor. Monthly API costs hit $500–$2,000 per engineer. 95% of engineers use AI tools monthly; 70% of committed code is AI-generated.

Persistent Data Loss in Claude Projects: Conversations Disappearing Without Recovery
A long-form writer reports losing entire days of work in Claude Projects due to conversations disappearing from the project chat list, unsearchable and unrecoverable, with no response from Anthropic support after three incidents.