Routerly: Self-Hosted LLM Gateway with Runtime Routing Policies and Budget Control

Routerly is a self-hosted LLM gateway built to address gaps in existing solutions. The developer created it because OpenRouter is cloud-based, and they wanted something runnable on their own infrastructure, while LiteLLM's routing felt too manual despite handling budgeting well.
Core Features
Instead of hardcoding a specific model in your application, Routerly lets you define routing policies that determine model selection at runtime. Available policies include:
- Cheapest
- Fastest
- Most capable
- Combinations of these policies
Budget control operates at the project level with actual per-token tracking, providing granular cost management.
Compatibility and Use
Routerly is OpenAI-compatible, meaning it can drop into existing workflows without code changes. Specifically mentioned compatible tools include:
- Cursor
- LangChain
- Open WebUI
It works with "anything else" that uses the OpenAI API format.
Current Status
The developer acknowledges there are rough edges and is seeking community feedback on:
- What's broken
- What's missing
- Whether the routing logic makes sense in practice
- Whether it solves a real problem people have
The tool is completely free and open source, with no commercial sales pitch. The developer is focused on practical feedback from the technical community.
Resources
- GitHub Repository: https://github.com/Inebrio/Routerly
- Website: https://www.routerly.ai
📖 Read the full source: r/LocalLLaMA
👀 See Also

PACT: A Programmatic Governance Framework for Claude Code After Agent Failure Patterns
A developer built PACT (Programmatic Agent Constraint Toolkit) after three months of recurring Claude Code failures on a 350+ file mobile app. The framework replaces unenforceable rules with mechanical constraints that physically block violations through pre-tool-use hooks.

blend-ai: New Blender MCP Service for Claude Code
blend-ai is a new Blender MCP service that allows Claude Code to generate 3D scenes. A user reported it worked faster and better than blender-mcp, creating a shuttle launch scene from reference images in 5 minutes.

Claude Desktop App Cowork Function Enables AI-to-AI Communication via Shared Google Docs
Users successfully implemented Claude-to-Claude communication using the new cowork function in the desktop app, with two AI agents reading and writing to a shared Google Doc in a structured five-exchange dialogue.

PACT 0.4.0 adds compound intelligence for AI coding agents
PACT (Programmatic Agent Constraint Toolkit) version 0.4.0 introduces compound intelligence features that help AI coding agents retain knowledge across sessions. The update includes research synthesis, a knowledge directory, and capability self-awareness systems.