RelayPlane Open Source Proxy Shows 73% Cost Reduction with Claude Model Routing

Open Source Proxy for Claude API Routing
RelayPlane is an open source, npm-native proxy that sits in front of the Anthropic API. The tool was built using Claude Code, which accelerated development. It's free to self-host and designed to handle routing between different Claude models based on prompt complexity.
Benchmark Results and Configuration
The benchmark used a mixed workload with 60% simple tasks and 40% complex tasks. Two scenarios were compared:
- Direct (all Sonnet): p50 latency 1.55s, cost per 10 requests $0.0323
- Via RelayPlane with routing: p50 latency 0.78s, cost per 10 requests $0.0086
This represents a 73.4% cost reduction. At 10,000 requests per day, this translates to approximately $712 in monthly savings.
Routing Configuration
The routing configuration is straightforward:
{
"routing": {
"complexity": {
"enabled": true,
"simple": "claude-haiku-4-5",
"moderate": "claude-sonnet-4-6",
"complex": "claude-opus-4-6"
}
}
}The routing logic uses a complexity classifier that examines token count, code indicators, and analytical keywords. Response headers include x-relayplane-routed-model to verify which model actually processed the request.
Model Pricing and Routing Logic
The routing system directs prompts to appropriate models based on complexity:
- Simple prompts → Haiku ($0.80 per million tokens)
- Moderate prompts → Sonnet ($3 per million tokens)
- Complex prompts → Opus ($15 per million tokens)
The author notes the classifier isn't perfect but is "good enough to capture most of the savings." The full benchmark methodology is available in a Gist linked in the source material.
📖 Read the full source: r/ClaudeAI
👀 See Also

Persistent Memory for Claude: Local Stack with MCP, 39ms Retrieval, 82% Token Reduction
A developer built a persistent memory layer for Claude using local vector search (Qdrant + Qwen3) and MCP integration, achieving 82% token reduction, 39ms hot-path retrieval, and session crystallization via L4 nodes.

Relvy improves Claude's root cause analysis accuracy by 12 percentage points on OpenRCA benchmark
Relvy, a tool that automates runbooks, has demonstrated a 12 percentage point improvement in Claude's accuracy on the OpenRCA benchmark for root cause analysis. The results were shared via a Hacker News post with 11 points.

Peek Plugin for Claude Code: Automatic Steering Through Session Memory
Peek is a Claude Code plugin that automatically captures and injects user corrections and preferences to steer the AI assistant. It uses fusion search with embeddings, BM25, time decay, and metadata filters to provide relevant context without manual prompting.

HolyClaude: Docker Container for Claude Code with Browser UI and Headless Chromium
HolyClaude is an open-source Docker container that packages Claude Code CLI with a browser UI, headless Chromium, and additional AI coding tools. Setup requires only docker compose up and provides access at localhost:3001.