RouteLLM Setup for Cost-Effective AI Task Routing

✍️ OpenClawRadar📅 Published: March 9, 2026🔗 Source

Docker Compose Configuration for Hybrid AI Setup

A Reddit user posted a detailed Docker Compose setup that implements what they call "Poor Man's Superintelligence" - a hybrid AI system that routes tasks between local and cloud models based on complexity.

Core Components

The system uses four main services:

vscode-openwire: Uses image sendmeticket/vscode-openwire:1.0.0 with ports 3000 and 3030 exposed. This provides access to GitHub Copilot through OpenWire, though the source notes this may violate TOS and suggests using an available API key instead.
ollama: Runs ollama/ollama:latest with port 11434 exposed. It automatically pulls and serves the qwen3.5:4b model as the local "weak" model.
openroutellm: Uses image sendmeticket/openroutellm:1.0.0 on port 6060. This is the routing service that decides which model handles each request.
openclaw: Runs ghcr.io/openclaw/openclaw:latest with ports 18789 and 18790 exposed, serving as the main interface.

RouteLLM Configuration

The openroutellm service is configured with specific parameters:

python -m routellm.openai_server --routers bert --default-router-threshold 0.75 --port 6060 --openwire-base-url http://vscode-openwire:3030/v1 --ollama-base-url http://ollama:11434/v1 --strong-model gpt-4o --weak-model qwen3.5:4b

This setup uses BERT-based routing with a 0.75 threshold to determine when to send tasks to the "strong" model (GPT-4o) versus the local "weak" model (Qwen3.5:4b).

How It Works

The system routes difficult tasks to the paid GPT-4o model through OpenWire/Copilot, while simpler tasks are handled by the local Qwen3.5:4b model running in Ollama. This creates what the author describes as a "fail-safe, local-first AI model with low base intelligence but really high max intelligence."

All services are connected through a custom Docker network (openclaw_net with subnet 172.10.10.0/24) and include health checks to ensure service availability.

📖 Read the full source: r/LocalLLaMA

👀 See Also

Tools

Solo developer builds cross-platform desktop AI agent with mobile remote control in 3 weeks, ships to 40+ countries

A solo developer built Skales, a native desktop AI agent with 139+ tools and a mobile companion app for remote control — all in 3 weeks using Claude. The app runs on macOS, Windows, and Linux, is local-first and free, and already has active users in 40+ countries.

May 1, 2026, 10:18 AM UTC

OpenClawRadar

Tools

Printable Claude Code Cheat Sheet with Weekly Auto-Updates

A developer created a one-page printable cheat sheet for Claude Code using Claude itself, covering keyboard shortcuts, slash commands, workflows, skills system, memory/CLAUDE.md, MCP setup, CLI flags, and config files. The HTML file is auto-updated weekly via cron job with new features tagged as 'NEW'.

Mar 14, 2026, 05:45 AM UTC

OpenClawRadar

Tools

OpenClaw's AWS Deployment: A Focus on Automation

OpenClaw's tool allows for one-click deployment to AWS, simplifying cloud operations for developers using AI coding agents.

Feb 16, 2026, 01:45 PM UTC

OpenClawRadar

Tools

Session Inspector for Claude Code provides real-time visibility into AI agent operations

Vibeyard, an open-source terminal IDE that wraps Claude Code, has added a Session Inspector feature that provides real-time visibility into Claude Code sessions with timeline tracking, cost breakdowns, tool analytics, and context window monitoring.

Apr 13, 2026, 08:45 AM UTC

OpenClawRadar