How Routing Layers Cut MCP Token Waste by 90%

A post on r/ClaudeAI reports a subtle but costly issue: when multiple MCP servers are configured, every prompt loads all of them by default, even trivial queries. The user had 5–6 servers and didn't notice until checking token usage—prompts were burning tokens on loading irrelevant server definitions every single time.

Key Details

Every prompt loaded the full set of MCP servers (5–6 servers).
Even simple prompts (e.g., "What time is it?") triggered all server definitions.
Solution: a custom routing layer that selects only the servers relevant to the prompt.
Result: token usage dropped significantly, and prompt response times improved.
The OP admitted they "cannot believe they let it go on that long without checking."

Technical Context

MCP (Model Context Protocol) servers are tools that extend Claude's capabilities (e.g., file system access, database queries, web scraping). The default behavior in many setups—including forked clients and manual configs—is to send the entire list of server definitions with each message. This means tools for DB access, file I/O, web browsing, etc. are all dumped into the context window before the actual user input is processed.

A routing layer can inspect the user's message (or system prompt) and conditionally include only the MCP servers whose descriptions or tools match the intent. For example, a prompt mentioning a file path would activate file tools; a question about stock prices would load only the finance server. This avoids the token overhead of irrelevant server metadata.