RouteLLM Setup for Cost-Effective AI Task Routing

Docker Compose Configuration for Hybrid AI Setup
A Reddit user posted a detailed Docker Compose setup that implements what they call "Poor Man's Superintelligence" - a hybrid AI system that routes tasks between local and cloud models based on complexity.
Core Components
The system uses four main services:
- vscode-openwire: Uses image
sendmeticket/vscode-openwire:1.0.0with ports 3000 and 3030 exposed. This provides access to GitHub Copilot through OpenWire, though the source notes this may violate TOS and suggests using an available API key instead. - ollama: Runs
ollama/ollama:latestwith port 11434 exposed. It automatically pulls and serves theqwen3.5:4bmodel as the local "weak" model. - openroutellm: Uses image
sendmeticket/openroutellm:1.0.0on port 6060. This is the routing service that decides which model handles each request. - openclaw: Runs
ghcr.io/openclaw/openclaw:latestwith ports 18789 and 18790 exposed, serving as the main interface.
RouteLLM Configuration
The openroutellm service is configured with specific parameters:
python -m routellm.openai_server --routers bert --default-router-threshold 0.75 --port 6060 --openwire-base-url http://vscode-openwire:3030/v1 --ollama-base-url http://ollama:11434/v1 --strong-model gpt-4o --weak-model qwen3.5:4bThis setup uses BERT-based routing with a 0.75 threshold to determine when to send tasks to the "strong" model (GPT-4o) versus the local "weak" model (Qwen3.5:4b).
How It Works
The system routes difficult tasks to the paid GPT-4o model through OpenWire/Copilot, while simpler tasks are handled by the local Qwen3.5:4b model running in Ollama. This creates what the author describes as a "fail-safe, local-first AI model with low base intelligence but really high max intelligence."
All services are connected through a custom Docker network (openclaw_net with subnet 172.10.10.0/24) and include health checks to ensure service availability.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Solo developer builds cross-platform desktop AI agent with mobile remote control in 3 weeks, ships to 40+ countries
A solo developer built Skales, a native desktop AI agent with 139+ tools and a mobile companion app for remote control — all in 3 weeks using Claude. The app runs on macOS, Windows, and Linux, is local-first and free, and already has active users in 40+ countries.

Printable Claude Code Cheat Sheet with Weekly Auto-Updates
A developer created a one-page printable cheat sheet for Claude Code using Claude itself, covering keyboard shortcuts, slash commands, workflows, skills system, memory/CLAUDE.md, MCP setup, CLI flags, and config files. The HTML file is auto-updated weekly via cron job with new features tagged as 'NEW'.

OpenClaw's AWS Deployment: A Focus on Automation
OpenClaw's tool allows for one-click deployment to AWS, simplifying cloud operations for developers using AI coding agents.

Session Inspector for Claude Code provides real-time visibility into AI agent operations
Vibeyard, an open-source terminal IDE that wraps Claude Code, has added a Session Inspector feature that provides real-time visibility into Claude Code sessions with timeline tracking, cost breakdowns, tool analytics, and context window monitoring.