OpenClaw Cost Optimization: How a Developer Fixed a $750 Mistake with Model Routing

What Went Wrong with the Cost Fix
After burning $750 in 3 days on OpenRouter, the developer initially "fixed" costs by swapping everything to Hunter Alpha (free on OpenRouter). This caused subagents to return zero output — silent completions where jobs showed "success" but results were empty.
A specific failure case: a video production agent wrote code that syntax-checked correctly, ran without errors, but produced a 9-second silent black video with no voiceover, no footage, and no manifest. QA eventually caught it. The lesson: free models don't always fail loudly — sometimes they quietly ship a stub and move on.
The New Model Routing Strategy
The developer stopped thinking "cheap vs expensive" and started thinking "what does this task actually need":
- Main session (orchestration): Sonnet 4.6 — "The manager. Worth the cost."
- Code/complex tasks: Gemini 2.5 Flash at $0.15/M — "Sweet spot for real output."
- Sensitive data (credentials, financials): Claude 3.5 Haiku — "Anthropic doesn't log prompts. Non-negotiable."
- Simple predictable tasks: Hunter Alpha — "Fine when failure is obvious and stakes are low."
Every cron job and subagent spawn now has an explicit model parameter — no defaults.
Security Discovery During the Audit
While investigating the model issues, the developer found credentials committed in their workspace repo — API keys and OAuth tokens. Although not pushed publicly, this was unacceptable. They added a .gitignore for credentials/ and ran git rm --cached. The warning: if you've ever committed a credentials folder, those keys remain in your git history — rotate them.
The Core Lesson
Cost optimization isn't a one-time config change. A $0.15/M model writing your production pipeline is money well spent. A free model that silently passes you a broken video is expensive no matter what it costs per token. Right-size to the job and verify output, not just exit codes.
📖 Read the full source: r/openclaw
👀 See Also

Claude AI Used to Generate Performance Evaluation Document from User History
A developer used Claude AI to complete a 3-4 page performance evaluation document by asking it to 'complete this documentation using information you have about me.' The AI generated a detailed document in 5-6 minutes that included work contributions the user had almost forgotten.

Speculative Decoding Benchmarks on RTX 3090 with Qwen Models for HVAC Business Use
A developer tested speculative decoding on an RTX 3090 using Qwen models for an HVAC business Discord bot, achieving up to 279.9 tokens/sec with a 236% speedup using Qwen3-8B with a Qwen3-1.7B draft model.

Benchmark vs. Production: When AI Agent Tests Pass but Real Workflows Fail
A developer switched production AI agents from Claude Sonnet to cheaper Grok and MiniMax models after they passed benchmark tests, but both failed in production due to operational reliability issues not covered by the benchmarks.

Claude Partner Program: Two-Person Consultancy Solves 10-Person Requirement with Certified Independents
A two-person AI consultancy used Claude to get into Anthropic's Partner Program, then used it to recruit a bench of certified independents to meet the 10-person requirement.