Cut Claude Costs 60x With DeepSeek V4 Flash Via MCP

A Reddit user analyzed their Claude usage and found the bulk of it went to mechanical tasks: classifying files, reformatting JSON, pulling fields from text, and summarizing docs they'd skim anyway. None of that needed Sonnet. The fix: a small cheap model running as a side worker via MCP, plus a single rule in CLAUDE.md telling Claude not to do those tasks.

Setup: an MCP tool + CLAUDE.md deny-list

The setup uses a single MCP tool that sends text and gets text back. Default model is DeepSeek V4 Flash (cheap, 1M context). The endpoint is one config line and works with any OpenAI-compatible provider (local ollama, vllm, lm studio). The repo is github.com/arizen-dev/deepseek-mcp (MIT, Python 3.10+).

The critical piece: the CLAUDE.md rule uses negative framing — a deny list, not a permission list. The user reports positive framing ("use DeepSeek for X") got ignored ~30% of the time. The deny list approach catches it reliably.

# In CLAUDE.md:
# do NOT use Claude for:
# - json formatting
# - field extraction
# - file classification
# - summarization you will review anyway

Results: 60x cost reduction

Over 3 weeks of real usage: 217 mechanical calls offloaded to DeepSeek V4 Flash, total spend $0.41. Same workload on Sonnet would have been roughly $7. That's a ~17x multiplier on just those tasks, and the user says overall bill dropped 60x when factoring in heavier tasks still on Sonnet.

How the side worker operates

The side worker is a supervised tool, not an agent — no tool calls, no file access, no chains. Latency is 3–25 seconds. You review the output. The whole shape is: send text, get text back, review, move on.