Route Claude Code through Ollama and Cut Your Bill ~90%

This repo by Coherence Daddy provides a complete setup to route Claude Code terminal sessions through a local Ollama instance while keeping Claude Desktop on Anthropic's paid Pro tier. The result: a claimed ~90% reduction in Claude Code API costs.
How It Works
You run two engines side by side:
- Claude Desktop (Anthropic) – used for strategy, architecture, code review, and tricky bugs.
- Claude Code → Ollama – used for lints, refactors, repetitive edits, batch file ops, and grep-and-replace tasks. Runs on a free open-source model (Gemma, Qwen, DeepSeek, your choice).
Setup Process
The repo includes a self-contained HTML presentation (21 slides) with a copy-paste prompt that does ~98% of the setup automatically. It auto-detects your OS (macOS, Windows + WSL2, Linux), installs everything, configures the router, and verifies both engines at the end.
To run locally:
git clone https://github.com/Coherence-Daddy/use-ollama-to-enhance-claude.git
cd use-ollama-to-enhance-claude/presentation
open index.html # macOS, or drag into browserOr directly use the copy-paste prompt from prompts/copy-paste-prompt.md.
Repository Structure
prompts/copy-paste-prompt.md– the setup prompt.presentation/index.html– full visual deck (no build step required).- Also hosted at coherencedaddy.com/tutorials/use-ollama-to-enhance-claude.
Why This Exists
Claude Pro on desktop is great for thinking and architecture, but Claude Code in the terminal burns through quota fast on context-heavy tasks. Routing those tasks through Ollama (local or cloud-hosted free models) keeps the same UX but at a fraction of the cost.
License
MIT – free to use, fork, or remix.
📖 Read the full source: HN AI Agents
👀 See Also

Model Routing Baselines for Claude and OpenAI Usage
A developer shares their model routing strategy using Claude Haiku 4.5, Sonnet 4.6, Opus 4.6, and ChatGPT 5.3 Codex for different task types, with fallbacks to GPT-5 Mini and GPT-5.4 when needed.

Trellis 2 Successfully Running on ROCm 7.11 with AMD RX 9070 XT
A developer got Trellis 2 working on Linux Mint 22.3 with an AMD RX 9070 XT using ROCm 7.11, fixing two key issues: ROCm instability with high N tensors and a broken hipMemcpy2D in CuMesh.

Debugging OpenClaw + Ollama Local Model Timeouts: Five Fixes for Silent Failures
A developer identified five root causes for OpenClaw agents silently timing out with local Ollama models like Gemma 4 26B, including a blocking slug generator, a 38K character system prompt, and hidden timeouts. The fixes involve disabling hooks, modifying configs, and adjusting Ollama settings.

OpenClaw Agent Cost Analysis: From $340 to $112 Monthly with Five Optimizations
A developer tracked 18,000 API calls across four OpenClaw agents for 30 days, finding 70% of tasks didn't need GPT-4.1. By implementing prompt caching, shortening system prompts, batching analytics, switching to cheaper models, and adding max token limits, costs dropped from $340 to $112 monthly.