Save on Claude Code Bills by Routing Planning Tokens to Cheaper Models

✍️ OpenClawRadar📅 Published: May 8, 2026🔗 Source
Save on Claude Code Bills by Routing Planning Tokens to Cheaper Models
Ad

A Reddit user reports saving about $40 in overage fees on Claude Code last month by splitting token usage across models. The key insight: planning steps (especially in multi-file refactors) can consume up to 80% of the token budget, but most planning doesn't need the most expensive model.

How It Works

They wrote a 30-line wrapper that routes the initial 'figure out what to change' work to Haiku 3.5 — a cheaper model. Only the actual edits and decision-making stay on Opus or Sonnet. The setup took about 2 hours, including figuring out which steps were worth handing off.

Results

Last cycle ended with budget left over for the first time in 4 months. The user avoided the usual 2-day wait for the reset window. Savings: roughly $40 in overage fees.

# Pseudocode for the wrapper logic:
# 1. Send planning prompt to haiku-3.5
# 2. Get back a list of files and changes
# 3. Pass the plan + instruction to opus/sonnet for actual edits
Ad

Caveats

Haiku's planning quality is noticeably worse on architecture decisions. For refactor-and-test workflows where Opus picks up the real decisions anyway, it's fine. For greenfield design ('what should this app even be'), the user still lets Opus plan from scratch.

The user notes that this pattern is 'probably obvious to anyone who's looked at the OpenRouter model pricing tables,' but the Claude Code subagent docs are thin on this exact approach.

📖 Read the full source: r/ClaudeAI

Ad

👀 See Also