Routerly: Self-Hosted LLM Gateway with Runtime Routing Policies and Budget Control

✍️ OpenClawRadar📅 Published: April 19, 2026🔗 Source

Routerly is a self-hosted LLM gateway built to address gaps in existing solutions. The developer created it because OpenRouter is cloud-based, and they wanted something runnable on their own infrastructure, while LiteLLM's routing felt too manual despite handling budgeting well.

Core Features

Instead of hardcoding a specific model in your application, Routerly lets you define routing policies that determine model selection at runtime. Available policies include:

Cheapest
Fastest
Most capable
Combinations of these policies

Budget control operates at the project level with actual per-token tracking, providing granular cost management.

Compatibility and Use

Routerly is OpenAI-compatible, meaning it can drop into existing workflows without code changes. Specifically mentioned compatible tools include:

Cursor
LangChain
Open WebUI

It works with "anything else" that uses the OpenAI API format.

Current Status

The developer acknowledges there are rough edges and is seeking community feedback on:

What's broken
What's missing
Whether the routing logic makes sense in practice
Whether it solves a real problem people have

The tool is completely free and open source, with no commercial sales pitch. The developer is focused on practical feedback from the technical community.

Resources

GitHub Repository: https://github.com/Inebrio/Routerly
Website: https://www.routerly.ai

📖 Read the full source: r/LocalLLaMA

👀 See Also

Tools

OpenClaw Kubernetes Operator with Embedded Ollama Support

A community member has created an OpenClaw Kubernetes operator that includes embedded Ollama support, allowing AI agents to run with local models in the same namespace. The setup includes installation commands, configuration details for both local and cloud Ollama models, and dashboard access instructions.

Mar 23, 2026, 11:45 PM UTC

OpenClawRadar

Tools

DeepMind DiscoRL Meta Learning Update Rule Ported from JAX to PyTorch

A developer has ported DeepMind's DiscoRL meta learning update rule from the 2025 Nature article from JAX to PyTorch. The implementation includes a GitHub repository with a Colab notebook, API, and weights hosted on Hugging Face.

Mar 9, 2026, 05:45 PM UTC

OpenClawRadar

Tools

Reddit user measures MCP token overhead: 67K tokens consumed before any question

A developer measured their MCP server token overhead at 67,000 tokens consumed before typing a single question, with Playwright MCP using 13,600 tokens and GitHub MCP using 18,000 tokens idle. They replaced MCP with skills and CLI tools for lower context costs.

Mar 23, 2026, 05:45 PM UTC

OpenClawRadar

Tools

Docent: An AI Assistant for Paper Analysis Built with Claude Code

A developer created Docent, an AI assistant that reads uploaded papers, presents them, answers questions, and assesses understanding using Claude Code. The project is available on GitHub under MIT License with a demo on Vercel.

Apr 19, 2026, 03:45 AM UTC

OpenClawRadar