MCP Context Bloat Fix: Cut 38k Tokens to 4k with BM25 Gateway

A Reddit user running 9 MCP servers in Claude Code for four months detailed the hidden costs and performance degradation they encountered, along with a concrete fix. The post is a must-read for anyone using MCP in production.

The Math

With 9 servers (filesystem, GitHub, Stripe, Linear, Notion, Postgres, Sentry, AWS, and custom) exposing 142 tools total, cold start consumes 38k tokens of system prompt + tool schemas every turn. At 200 turns/day, that's 7.6M input tokens/day. At Sonnet pricing (~$15/M output, ~$3/M input), that's ~$23/day or ~$700/month just in MCP tool definitions — before any actual work. Cache only helps on identical prefixes; rotating one MCP server invalidates it.

What Breaks

Tool selection degrades: With 142 tools in context, Claude started picking the wrong tool for obvious queries (e.g., using linear_search_issues when asked to read a file).
Slow enumeration: Schema-heavy servers like AWS take 4–6 seconds to list tools.
Silent error propagation: One poorly-described tool can taint the ranking for every related query.

The Fix: Gateway Pattern with BM25

The user switched to a gateway pattern using Ratel, an open-source, in-process Rust library with BM25 ranking. Claude now sees only three tools: search_tools, invoke_tool, and auth. Everything else is ranked on-demand. Results:

Cold start dropped from 38k to ~4k tokens.
Wrong-tool selection nearly eliminated because the model only ever sees the top 5 ranked by query.
Setup took 10 minutes (one command does the Claude Code import).

The author notes that most "MCP optimizer" startups are just BM25 search dressed up. Tool descriptions are short, structured, and full of keyword matches — no vector DB or LLM-in-the-loop needed. BM25 over a flat projection of name + description gets 90% of the win deterministically in microseconds, offline.

Key lesson: "replace" beats "suggest". If your gateway hands the model 5 tools instead of 142, the math works. If it suggests 5 alongside 142, the model still loads 142 and you saved nothing.

📖 Read the full source: r/ClaudeAI

MCP Context Bloat: Real Costs and a Practical Fix for Claude Code Users

The Math

What Breaks

The Fix: Gateway Pattern with BM25

👀 See Also

HolyClaude: Docker Container for Claude Code with Browser UI and Headless Chromium

Slack Plugin for Claude Code: Connect to Slack for Context and Updates

Meeting Summarization on a 6GB GPU: qwen3.5:0.8B Works at 57s, Granite 4 350M Hallucinates

Hyper iOS App: Voice Recorder with Real-Time Transcription and Action Extraction