Routing Claude API traffic to control costs after Max subscription change

API billing migration and cost implications
As of noon PT, Anthropic's Max subscription no longer covers usage from third-party tools like OpenClaw. All OpenClaw users are now on API billing with these rates:
- Claude Opus 4.6: $5 per million input tokens, $25 per million output tokens
- Claude Sonnet 4.6: $3 per million input tokens, $15 per million output tokens
- Claude Haiku 4.5: $1 per million input tokens, $5 per million output tokens
A heavy OpenClaw session on Opus can cost $1-4, while the same session on Sonnet costs $0.20-0.80 with similar results for most tasks.
The routing solution
Most OpenClaw operations don't require Opus: heartbeat checks, file reads, summaries, routing decisions, and short tool calls can all be handled by Sonnet. Without a routing layer, every request hits your default model, potentially wasting Opus budget on simple tasks.
A local proxy routes Claude requests by complexity: simple tasks go to Sonnet automatically, complex ones escalate to Opus. This approach has significantly reduced costs without quality loss on important tasks.
The proxy is open source and installable via npm: npm install -g @relayplane/proxy
Detailed documentation and discussion is available on r/ClaudeCode, where the solution has received 52K views.
📖 Read the full source: r/openclaw
👀 See Also

Relational Memory for LLMs: Three-Layer System Models User Relationships
An open-source Python tool that adds relational memory to LLMs by modeling user-AI relationships across seven psychological dimensions, using a three-layer narrative structure instead of flat fact storage.

Pair Programmer Plugin Adds Live Screen, Voice, and Audio Context to Claude Code
A developer has built a plugin called Pair Programmer that gives Claude Code real-time desktop perception by capturing screen, microphone, and system audio streams. The architecture uses specialized agents running in parallel for different input types, with indexing currently handled by cloud models but designed to be model-agnostic.

civStation: A VLM System for Playing Civilization VI via Natural Language Commands
civStation is a computer-use VLM harness that plays Civilization VI by translating high-level natural language commands into in-game actions. The system uses a 3-layer architecture separating strategy and execution, with support for human-in-the-loop intervention.

OpenClaw Integrates Features from Claude Code Leak
An OpenClaw user had their bot analyze the leaked Claude Code (Rust recreation by Instructkr) and selectively ported specific architectural patterns into their OpenClaw setup. The integration focuses on practical improvements like automatic startup continuity, conversation compaction, and a pre-tool/post-tool hook framework.