Dual-model architecture reduces token consumption by half for long conversations

Context compression system for AI agents
A developer on r/ClaudeAI shared a solution to the problem of AI agents losing context after conversation compaction. The system uses a dual-model architecture where a cheap small model (called the "subconscious") continuously compresses conversation history in the background.
Architecture details
The system has four layers:
- Narrative summary (~1K tokens)
- Compressed factoids
- Semantically retrieved verbatim quotes
- Raw recent turns
The main model ("conscious") receives a curated ~35K token context with the same information density that would normally require 120K tokens of raw history. The main model reads one coherent timeline and doesn't know the memory system exists.
Performance results
The developer simulated 260 turns across different conversation types. For sustained project work (starting with heavy research and gradually shifting to quick exchanges as the model learns the domain), the system cuts token consumption roughly in half.
Development tools
The system was built with Claude Code for the simulation and Claude.ai in the consulting and research stage. The developer is looking for others who have tried routing a smaller model to manage context for a larger one or found other workarounds for the compaction problem.
📖 Read the full source: r/ClaudeAI
👀 See Also

Black LLAB: Open-Source Architecture for Dynamic Model Routing and Docker-Sandboxed AI Agents
A developer has open-sourced Black LLAB, a system that uses Mistral 3B to route prompts between local and cloud models and runs AI agents in isolated Docker containers with OpenClaw integration.

50 Popular Apps Reverse-Engineered into Claude-Readable Design Specs: Key Patterns for UI Cloning
u/meliwat reverse-engineered 50 popular apps into structured markdown design specs. Claude nails UI clones with exact values, state coverage, spacing scales, and navigation graphs. Longer prose degrades output.

Claude Code hooks prevent Chrome tab interference between multiple sessions
A developer created three hooks (session-start, capture-tab-id, enforce-tab-id) that pin each Claude Code session to its own Chrome tab, preventing sessions from accidentally accessing other sessions' tabs during test runs and form fills.

Claude Desktop Feature Request: Session Start Hook for Automatic Initialization
A developer building persistent context systems for Claude Desktop identifies a gap: the User Preferences field only injects instructions when the user sends the first message, requiring manual triggers for initialization. They propose adding an "On Session Start" execution field that runs automatically when a new conversation opens.