MCP Context Bloat: Real Costs and a Practical Fix for Claude Code Users

A Reddit user running 9 MCP servers in Claude Code for four months detailed the hidden costs and performance degradation they encountered, along with a concrete fix. The post is a must-read for anyone using MCP in production.
The Math
With 9 servers (filesystem, GitHub, Stripe, Linear, Notion, Postgres, Sentry, AWS, and custom) exposing 142 tools total, cold start consumes 38k tokens of system prompt + tool schemas every turn. At 200 turns/day, that's 7.6M input tokens/day. At Sonnet pricing (~$15/M output, ~$3/M input), that's ~$23/day or ~$700/month just in MCP tool definitions — before any actual work. Cache only helps on identical prefixes; rotating one MCP server invalidates it.
What Breaks
- Tool selection degrades: With 142 tools in context, Claude started picking the wrong tool for obvious queries (e.g., using
linear_search_issueswhen asked to read a file). - Slow enumeration: Schema-heavy servers like AWS take 4–6 seconds to list tools.
- Silent error propagation: One poorly-described tool can taint the ranking for every related query.
The Fix: Gateway Pattern with BM25
The user switched to a gateway pattern using Ratel, an open-source, in-process Rust library with BM25 ranking. Claude now sees only three tools: search_tools, invoke_tool, and auth. Everything else is ranked on-demand. Results:
- Cold start dropped from 38k to ~4k tokens.
- Wrong-tool selection nearly eliminated because the model only ever sees the top 5 ranked by query.
- Setup took 10 minutes (one command does the Claude Code import).
The author notes that most "MCP optimizer" startups are just BM25 search dressed up. Tool descriptions are short, structured, and full of keyword matches — no vector DB or LLM-in-the-loop needed. BM25 over a flat projection of name + description gets 90% of the win deterministically in microseconds, offline.
Key lesson: "replace" beats "suggest". If your gateway hands the model 5 tools instead of 142, the math works. If it suggests 5 alongside 142, the model still loads 142 and you saved nothing.
📖 Read the full source: r/ClaudeAI
👀 See Also

HolyClaude: Docker Container for Claude Code with Browser UI and Headless Chromium
HolyClaude is an open-source Docker container that packages Claude Code CLI with a browser UI, headless Chromium, and additional AI coding tools. Setup requires only docker compose up and provides access at localhost:3001.

Slack Plugin for Claude Code: Connect to Slack for Context and Updates
Slack has released a new plugin for Claude Code that enables connection to Slack for search, messaging, and document creation. The plugin allows Claude Code to access Slack context to unblock technical problems and post updates.

Meeting Summarization on a 6GB GPU: qwen3.5:0.8B Works at 57s, Granite 4 350M Hallucinates
VoiceFlow v1.6.0 adds local meeting recording and summarization. Benchmarking sub-1B models on a 6GB RTX 3060: qwen3.5:0.8B produces structured summaries in 57s at 2.2GB VRAM, while Granite 4 350M hallucinates badly.

Hyper iOS App: Voice Recorder with Real-Time Transcription and Action Extraction
Hyper is an iOS voice recorder app that transcribes conversations in real-time, provides summaries and action items, and allows mid-conversation queries via wakeword detection. It's designed for unstructured meetings like 1:1s, coffee chats, and standups.