Context Mode MCP Server Cuts Claude Code Context Usage by 98%

Context Mode is an MCP server that sits between Claude Code and tool outputs, reducing context window consumption by 98%. Instead of dumping raw data into the 200K context window, it processes outputs in isolated sandboxes.
How It Works
The sandbox system spawns isolated subprocesses for each execute call. Scripts run in these subprocesses with their own process boundaries, and only stdout enters the conversation context. Raw data like log files, API responses, and snapshots never leaves the sandbox.
Ten language runtimes are available: JavaScript, TypeScript, Python, Shell, Ruby, Go, Rust, PHP, Perl, and R. Bun is auto-detected for 3-5x faster JS/TS execution. Authenticated CLIs (gh, aws, gcloud, kubectl, docker) work through credential passthrough where subprocesses inherit environment variables and config paths without exposing them to the conversation.
Knowledge Base Features
The index tool chunks markdown content by headings while keeping code blocks intact, then stores them in a SQLite FTS5 virtual table. Search uses BM25 ranking with Porter stemming applied at index time. When you call search, it returns exact code blocks with their heading hierarchy.
The fetch_and_index tool extends this to URLs: fetch, convert HTML to markdown, chunk, and index. The raw page never enters context.
Performance Benchmarks
- Playwright snapshot: 56 KB → 299 B
- GitHub issues (20): 59 KB → 1.1 KB
- Access log (500 requests): 45 KB → 155 B
- Analytics CSV (500 rows): 85 KB → 222 B
- Git log (153 commits): 11.6 KB → 107 B
- Repo research (subagent): 986 KB → 62 KB (5 calls vs 37)
Over a full session: 315 KB of raw output becomes 5.4 KB. Session time before slowdown goes from ~30 minutes to ~3 hours. Context remaining after 45 minutes: 99% instead of 60%.
Installation
Two installation methods:
- Plugin Marketplace:
/plugin marketplace add mksglu/claude-context-modethen/plugin install context-mode@claude-context-mode - MCP-only:
claude mcp add context-mode -- npx -y context-mode
After installation, restart Claude Code. Context Mode includes a PreToolUse hook that automatically routes tool outputs through the sandbox. Subagents learn to use batch_execute as their primary tool, and bash subagents get upgraded to general-purpose so they can access MCP tools.
The tool is open source under MIT license at github.com/mksglu/claude-context-mode.
📖 Read the full source: HN LLM Tools
👀 See Also

Benchmark: Gemma4 12B vs Qwen3 8B quantized on 24GB Mac Mini
A developer tested Gemma4 12B against Qwen3:8b-q4_K_M on a 24GB Mac Mini using two prompts. Qwen3 processed prompts 4-5x faster, while Gemma4 generated output slightly faster.

vllm-mlx fork adds tool calling and prompt cache for local AI coding agents
A developer has modified vllm-mlx to fix tool calling issues and add prompt caching, reducing TTFT from 28s to 0.3s for OpenClaw on Apple Silicon. The fork supports Qwen3-Coder-Next at 65 tok/s on M3 Ultra with working function calling.

Cull: Open-Source Dataset Curation Engine for AI Image Pipelines
Cull scrapes images from 340+ sources including Civitai, X/Twitter, Reddit, Discord, and booru sites, classifies them with a vision-language model via local LM Studio or Groq, and sorts into category folders with SD prompts and audit records.
Collaborate: A Claude Code Skill for Structured, Asynchronous Document Writing with Multi-Agent Handoffs
A Claude Code skill called 'collaborate' enables multi-contributor document writing where each participant gets a plain‑English briefing from Claude on previous changes, reasoning, and next tasks, with support for parallel sections, structured critique, and Slack/Signal notifications.