MCP Context Bloat: Real Costs and a Practical Fix for Claude Code Users

✍️ OpenClawRadar📅 Published: May 19, 2026🔗 Source
MCP Context Bloat: Real Costs and a Practical Fix for Claude Code Users
Ad

A Reddit user running 9 MCP servers in Claude Code for four months detailed the hidden costs and performance degradation they encountered, along with a concrete fix. The post is a must-read for anyone using MCP in production.

The Math

With 9 servers (filesystem, GitHub, Stripe, Linear, Notion, Postgres, Sentry, AWS, and custom) exposing 142 tools total, cold start consumes 38k tokens of system prompt + tool schemas every turn. At 200 turns/day, that's 7.6M input tokens/day. At Sonnet pricing (~$15/M output, ~$3/M input), that's ~$23/day or ~$700/month just in MCP tool definitions — before any actual work. Cache only helps on identical prefixes; rotating one MCP server invalidates it.

What Breaks

  • Tool selection degrades: With 142 tools in context, Claude started picking the wrong tool for obvious queries (e.g., using linear_search_issues when asked to read a file).
  • Slow enumeration: Schema-heavy servers like AWS take 4–6 seconds to list tools.
  • Silent error propagation: One poorly-described tool can taint the ranking for every related query.
Ad

The Fix: Gateway Pattern with BM25

The user switched to a gateway pattern using Ratel, an open-source, in-process Rust library with BM25 ranking. Claude now sees only three tools: search_tools, invoke_tool, and auth. Everything else is ranked on-demand. Results:

  • Cold start dropped from 38k to ~4k tokens.
  • Wrong-tool selection nearly eliminated because the model only ever sees the top 5 ranked by query.
  • Setup took 10 minutes (one command does the Claude Code import).

The author notes that most "MCP optimizer" startups are just BM25 search dressed up. Tool descriptions are short, structured, and full of keyword matches — no vector DB or LLM-in-the-loop needed. BM25 over a flat projection of name + description gets 90% of the win deterministically in microseconds, offline.

Key lesson: "replace" beats "suggest". If your gateway hands the model 5 tools instead of 142, the math works. If it suggests 5 alongside 142, the model still loads 142 and you saved nothing.

📖 Read the full source: r/ClaudeAI

Ad

👀 See Also