OpenClaw Implements Agent History Compression to Reduce Context Usage

✍️ OpenClawRadar📅 Published: March 10, 2026🔗 Source
OpenClaw Implements Agent History Compression to Reduce Context Usage
Ad

Context Management Problem

When running OpenClaw inside Docker, direct code writing by the agent fills context with noise: reading files (5K tokens), writing edits (500 tokens), running tests (200 tokens), and receiving stack traces (3K tokens). A single debug cycle consumes 10K-15K tokens, mostly from console output and stack traces that become useless after bug fixes. With 20-30 debug cycles per session, the entire context window gets consumed by noise.

Brain/Worker Architecture

The solution involves separating responsibilities: OpenClawd (in Docker) acts as the brain for planning, breaking work into subtasks, delegating, and coordinating. A local worker on the macOS host, powered by Qwen3.5-27B running on Apple Silicon via MLX with zero cost, serves as the hands for reading files, writing code, running tests, and debugging. This keeps noisy back-and-forth in the worker's context, with the brain only seeing final results like "task done, here are the files that changed."

Compression Strategy

Even with the brain/worker split, the orchestrator's context still fills up with operating docs: AGENTS (~6.6K tokens), SOUL (~1.5K tokens), LESSONS (~10K tokens), and plans/walkthroughs (~13K tokens on disk), totaling 20K-30K tokens before any work begins. Sessions can reach 100K-200K tokens.

The key insight: finished work doesn't need raw detail. Once a subtask is completed, its raw history becomes dead weight. The agent only needs to know: what was the task, did it succeed, what files changed, and any errors.

Ad

Implementation Details

Step 1: Detect lifecycle boundaries - The orchestrator decomposes work into subtasks with lifecycles: Spawn (agent calls sessions_spawn or delegate_task), Execute (tool calls, reasoning), and Complete (System Message "subagent 'task_name' completed"). A 4-pass scanner walks the session JSONL:

  • Pass 1: Find spawn events
  • Pass 2: Find spawn errors
  • Pass 3: Find completion markers
  • Pass 4: Compute tokens count and duration per lifecycle

This identifies message ranges belonging to completed subtasks.

Step 2: Summarize in "agent-language" (masking) - Summaries are generated to look like normal agent output to maintain compatibility with the orchestrator's expected message format (roles, content blocks, tool call structures, parent-child ID chains). These masked summaries replace raw task history.

Example compacted task summary:

── COMPACTED TASK ──
origin: agent
task: Implement idle timeout for MLX server
outcome: success
result: Added 5-min idle timer to MlxServerManager.
Server auto-unloads when no requests received.
files+: src/services/mlx_idle_monitor.py
files~: src/services/mlx_server.py, config.json
errors: none
tried_and_failed: threading.Timer — race condition
must_remember: MLX server must only reload on explicit worker request, not any tool call
─────────────────

This ~100 token summary replaces 5K tokens of raw tool calls and reasoning (99.2% reduction). Summaries are generated by a cheap LLM (Gemini Flash Lite or local MLX), with fallback mechanisms if generation fails.

📖 Read the full source: r/openclaw

Ad

👀 See Also

Creation OS: A Local σ-Gated LLM Runtime That Lets Models Say ‘I Don’t Know’ Instead of Hallucinating
Tools

Creation OS: A Local σ-Gated LLM Runtime That Lets Models Say ‘I Don’t Know’ Instead of Hallucinating

Creation OS wraps local LLMs (BitNet, Qwen, Gemma, any GGUF) with a σ-gate that measures multiple uncertainty channels and decides ACCEPT, RETHINK, or ABSTAIN per output. No cloud, no API. TruthfulQA accuracy improved ~29% via selective regeneration.

OpenClawRadar
Graph Compose: Hosted Temporal Workflows with Visual Builder and AI
Tools

Graph Compose: Hosted Temporal Workflows with Visual Builder and AI

Graph Compose is a hosted platform for orchestrating API workflows on Temporal, letting you define workflows as JSON graphs with three building methods: a React Flow visual builder, a TypeScript SDK, and an AI assistant that converts plain English to graphs.

OpenClawRadar
Broccoli: Open-source harness for running AI coding agents from Linear tickets in cloud sandboxes
Tools

Broccoli: Open-source harness for running AI coding agents from Linear tickets in cloud sandboxes

Broccoli is an open-source tool that takes coding tasks from Linear, executes them in isolated cloud sandboxes using Claude and Codex, and opens PRs for human review. It runs on your own Google Cloud infrastructure with production-grade deployment.

OpenClawRadar
Session Inspector for Claude Code provides real-time visibility into AI agent operations
Tools

Session Inspector for Claude Code provides real-time visibility into AI agent operations

Vibeyard, an open-source terminal IDE that wraps Claude Code, has added a Session Inspector feature that provides real-time visibility into Claude Code sessions with timeline tracking, cost breakdowns, tool analytics, and context window monitoring.

OpenClawRadar