Claude AI Session Compaction Issues and Workarounds

✍️ OpenClawRadar📅 Published: March 17, 2026🔗 Source
Claude AI Session Compaction Issues and Workarounds
Ad

How Compaction Works

Claude sessions are stored as JSONL files at ~/.claude/projects/{encoded-cwd}/sessions/{id}.jsonl. Each conversation turn is a JSON block. When compaction triggers, original blocks remain in the file, but a new block with a compressed summary gets appended. After compaction, the model works from the summary instead of the full conversation history.

Test Results

With a coding project at 90% context fill (before the 1 million token increase), the user tested 10 questions covering simple recall, 6-hop dependency chains, entity disambiguation, negation chaining, absence detection, and conflict detection.

  • Pre-compaction: ~9.75/10 accuracy with Opus 4.6 finding scattered facts across 418K tokens
  • Post-compaction (Default): ~5/10 accuracy with 3,461 tokens (121x compression). Same session, same questions resulted in hallucinated incorrect answers.
  • Post-compaction (Manual Opus): ~9.75/10 accuracy with 6,080 tokens (69x compression). Using a custom compaction prompt with Opus preserved important information.
Ad

Why the Difference

According to Anthropic's documentation, the API defaults to using the same model for compaction. The user was running Opus 4.6 on medium compute, so default compaction should have used Opus too. The quality difference suggests issues with the summarization prompt, thinking/compute budget, or both.

Workarounds

Approach 1: Opus Compaction - Turn off auto-compaction and implement a background process that measures token counts for Claude Code instances. Trigger compaction using Opus with a custom prompt (potentially with user authorization).

Approach 2: spaCy NER Pre-seeding - Instead of starting sub-agents with zero context, use spaCy NER to extract proper nouns, numbers, service names, ports, and key identifiers from project files. Inject this as a lightweight entity briefing (few hundred tokens) at startup to inform agents about existing resources without narrative bloat.

📖 Read the full source: r/ClaudeAI

Ad

👀 See Also

Humanizer Pipeline Open-Sourced: Six-Step Markdown File for AI Text Post-Processing
Tools

Humanizer Pipeline Open-Sourced: Six-Step Markdown File for AI Text Post-Processing

A single Markdown file implements a six-step pipeline to detect and rewrite AI-generated text, with channel awareness, voice calibration, severity gates, and self-audit passing.

OpenClawRadar
Fine-tuned Qwen3.5-2B with RAG-Engram architecture improves grounded answer accuracy from 50% to 93% at 8K context
Tools

Fine-tuned Qwen3.5-2B with RAG-Engram architecture improves grounded answer accuracy from 50% to 93% at 8K context

A developer fine-tuned Qwen3.5-2B with a custom RAG-Engram architecture to address the 'lost in the middle' phenomenon, improving correct answers at 8K tokens from 50% to 93% on real-world queries. The system uses a two-level approach with static entity embeddings and dynamic chunk navigation.

OpenClawRadar
Wolfram Tech Now Available as Foundation Tool for LLM Systems
Tools

Wolfram Tech Now Available as Foundation Tool for LLM Systems

Stephen Wolfram announces Wolfram Language is now available as a foundation tool for LLM systems, providing deep computation and precise knowledge to supplement LLM capabilities. The announcement follows three years of development since the initial Wolfram plugin for ChatGPT was released in March 2023.

OpenClawRadar
Semble: Code Search for AI Agents Using 98% Fewer Tokens Than grep+read
Tools

Semble: Code Search for AI Agents Using 98% Fewer Tokens Than grep+read

Semble is an open-source code search library for AI agents that combines static Model2Vec embeddings with BM25, running entirely on CPU. It indexes a repo in ~250ms and answers queries in ~1.5ms, achieving 0.854 NDCG@10 — 99% of a 137M-parameter transformer's quality — while using 98% fewer tokens than grep+read.

OpenClawRadar