Save on Claude Code Bills by Routing Planning Tokens to Cheaper Models

A Reddit user reports saving about $40 in overage fees on Claude Code last month by splitting token usage across models. The key insight: planning steps (especially in multi-file refactors) can consume up to 80% of the token budget, but most planning doesn't need the most expensive model.
How It Works
They wrote a 30-line wrapper that routes the initial 'figure out what to change' work to Haiku 3.5 — a cheaper model. Only the actual edits and decision-making stay on Opus or Sonnet. The setup took about 2 hours, including figuring out which steps were worth handing off.
Results
Last cycle ended with budget left over for the first time in 4 months. The user avoided the usual 2-day wait for the reset window. Savings: roughly $40 in overage fees.
# Pseudocode for the wrapper logic:
# 1. Send planning prompt to haiku-3.5
# 2. Get back a list of files and changes
# 3. Pass the plan + instruction to opus/sonnet for actual edits
Caveats
Haiku's planning quality is noticeably worse on architecture decisions. For refactor-and-test workflows where Opus picks up the real decisions anyway, it's fine. For greenfield design ('what should this app even be'), the user still lets Opus plan from scratch.
The user notes that this pattern is 'probably obvious to anyone who's looked at the OpenRouter model pricing tables,' but the Claude Code subagent docs are thin on this exact approach.
📖 Read the full source: r/ClaudeAI
👀 See Also

Annotation-Driven UI: How to Design Templates in Figma and Let Claude Extract Coordinates
Skip building a custom layout engine: design flat PNGs in Figma, draw colored rectangles for slots, feed both to Claude, and get editable area definitions with tap targets. One afternoon instead of weeks.

Governance Layer for Claude Agents: Hard Safety Boundaries and Live Traces in Production
A Claude API user built a lightweight governance layer below the agent to add hard safety boundaries, real-time traces, human-in-the-loop control via Telegram, and automatic checkpointing — solving silent failures and runaway token costs in long-running agent loops.

How to Fix Claude Code's CSS Guesswork with a Design System
A developer found Claude Code repeatedly regenerated misaligned HTML/CSS because it designs blind without visual feedback. The solution: provide a complete design system with spacing, colors, and type variables, then separate HTML and CSS prompts.

6 Loop Types Found in Production AI Agents: A Week-Long Log Analysis
Analysis of 670 events from 5 production agents over a week reveals 6 high-severity loop patterns including decision oscillation, retry loops, ping pong loops, recall-write loops, reflection loops, and tool non-determinism.