Semble: A Local MCP Server for Claude Code with 98% Token Reduction

Semble is an MCP server that lets Claude Code search local codebases efficiently, returning only relevant code chunks instead of full files. It uses a hybrid of static embeddings, BM25, and a code-optimized reranking stack, all running locally on CPU — no API keys, no GPU, no heavy dependencies.
Installation
Install via uvx:
claude mcp add semble -s user -- uvx --from "semble[mcp]" semble
Once installed, Claude Code can search any repo — local or remote — directly.
Key Details
- Token reduction: Uses ~98% fewer tokens than the typical grep+read approach.
- Performance: Indexes any repo in ~250ms, answers queries in ~1.5ms (all on CPU).
- Quality: Reaches NDCG@10 of 0.854 — 99% of the best transformer hybrid tested, while being ~200x faster.
- Benchmarked against: grepai, probe, colgrep, and other existing methods.
- Open source: Available on GitHub under the MinishLab organization.
Who It's For
Developers using Claude Code on large codebases who want to reduce token burn and latency while getting high-quality code search results without external API calls.
📖 Read the full source: r/ClaudeAI
👀 See Also

EmoBar: Visualizing Claude's Internal Emotion Vectors from Anthropic Paper
A developer built EmoBar, an open-source tool that visualizes the 171 internal emotion representations in Claude identified in Anthropic's recent paper. The tool uses a dual-channel approach to surface these measurable vectors that causally drive model behavior.

OpenClaw skill reduces accessibility tree tokens from 600K to 1.3K for ad-heavy sites
A developer built an OpenClaw skill that uses ML-based element ranking to prune accessibility trees, cutting slickdeals.com from ~598K tokens to ~1.3K tokens by keeping only the top ~50 actionable elements.

Qwen3.5-35B-A3B-UD-Q6_K_XL Tested in Production Development Workflows
A developer tested the Qwen3.5-35B-A3B-UD-Q6_K_XL model across multiple real client projects, achieving solid performance with benchmarks of 1504pp2048 and 47.71 tg256, and token speeds of 80tps on a single GPU.

OpenClaw: Revolutionizing Website Maintenance with Continuous Surveillance
OpenClaw, an innovative AI-driven agency, redefines website maintenance by operating tirelessly around the clock. Harnessing advanced automation, it ensures optimal website functionality and promptly addresses issues.