Routing Agent Subtasks to Cheaper Models Dropped Cost from $18 to $4 on Same Refactor

One developer on r/ClaudeAI describes a practical cost-optimization strategy for agent loops: route routine subtasks to cheap models and reserve expensive models (Opus 4.7) only for complex reasoning. Their refactoring agent — handling CSS variable renames, YAML config updates, and linter runs via MCP — originally sent every step to Opus 4.7 at a total of about $18. After implementing routing logic, 178 of 212 steps went to cheap models, reducing cost to roughly $4 with no observable quality difference on routine changes.
Routing Logic
- Hard subtasks → Opus 4.7: Component architecture, debugging 2am code, anything requiring sustained reasoning across long conversations. The author notes Opus is genuinely unmatched at that kind of work — a previous attempt to route an auth middleware bug to a cheaper model silently broke session handling, costing an hour to trace.
- Routine subtasks → cheaper models: Lint, rename, config edits, tool orchestration. The author settled on DeepSeek V4 Pro for general coding chores and Tencent Hunyuan Hy3 preview for heavy tool calling. As of late April, Hunyuan Hy3 was ranked #1 on OpenRouter by tool call volume and almost never botches a function call when the schema is clean.
Cost Comparison
- Opus 4.7: ~$0.18 per million input tokens (estimated from context of ~28x cheaper alternative).
- Tencent Hunyuan Hy3: $0.18 per million input tokens, $0.59 per million output — roughly 28x cheaper than Opus 4.7 on input.
- Same 212-step refactor: 178 steps to cheap tier, 34 steps to Opus. Cost dropped from $18 to ~$4.
Failure Modes
- The tool-calling model hallucinates parameters when schemas are sloppy (author admits schemas were bad).
- DeepSeek V4 Pro occasionally writes syntactically perfect code that does the opposite of what was asked, surviving a quick skim.
- Neither cheap model can match Opus for debugging deep issues (e.g., auth flow silently eating a cookie).
Decision Heuristic
The author's routing rule of thumb: "How expensive is a wrong answer to catch?" A bad lint fix costs a 2-second git revert; a bad architecture call costs the whole afternoon.
The savings enabled previously skipped tasks — like writing and running tests for every CSS change, or regenerating all Open Graph images — because at fractions of a cent per tool call there's no reason not to.
📖 Read the full source: r/ClaudeAI
👀 See Also

OpenClaw Discord proxy fix for REST API timeout issues
A user reports fixing OpenClaw Discord connection issues where WebSocket connects but REST API calls fail with "fetch failed UND_ERR_CONNECT_TIMEOUT" errors. The solution involves creating a proxy-preload.cjs file and setting global undici proxy settings.

iCloud Desktop/Documents Sync Causes File Loss Issues with Claude on Mac
A Mac user reports that enabling iCloud Drive sync for Desktop and Documents folders causes Claude to create duplicate files and can lead to permanent data loss, including hidden /.claude folders that iCloud doesn't back up.

Claude Code Headless Mode with --print Flag
Claude Code can run in headless mode using the --print flag, allowing prompts to be piped in for automated output without interactive sessions. This enables integration into CI/CD pipelines, git hooks, and bash scripts.

Pre-coding routine with Claude Code: 5 MCP servers before writing a line
A developer shares a 60-90 second routine using 5 MCP servers (memory, codebase graph, Tavily search, Context7 docs) and safety hooks to dramatically reduce hallucinations and wasted edits.