Open-source local hook automatically switches Claude models to cut AI costs

A developer has open-sourced a local hook that automatically selects the most cost-effective Claude AI model based on the type of coding task, potentially reducing AI costs by 50-70% without quality loss.
How it works
The tool runs as a local hook in Cursor and Claude Code (both use the same hook system) before each prompt is sent. It sits next to Opus/plan and acts as an efficient front-end filter that prevents obviously bad model matches before they hit expensive models.
Key functionality
- Reads the prompt and current model selection
- Uses simple keyword rules to classify tasks (git operations, feature work, architecture/deep analysis)
- Blocks if you're overpaying (e.g., Opus for git commit) and suggests Haiku or Sonnet
- Blocks if you're underpowered (Sonnet/Haiku for architecture) and suggests Opus
- Lets everything else through unchanged
- ! prefix bypasses the filter completely if you disagree with its suggestion
Technical details
- 3 files: bash + python3 + JSON
- No proxy, no API calls, no external services
- Fail-open design: if it hangs, Claude Code proceeds normally
- Open-sourced at: https://github.com/coyvalyss1/model-matchmaker
Performance and testing
The developer analyzed several weeks of their own prompts and found:
- 60-70% were standard feature work Sonnet could handle
- 5-20% were debugging/troubleshooting
- A significant portion were pure git/rename/formatting tasks that Haiku handles identically at 90% less cost
Retroactive analysis showed the tool would have cut 50-70% of AI spend with no quality drop. After tuning, it correctly handled 12/12 real test prompts.
Problem it solves
The issue isn't knowledge—developers know they should switch models—but friction. When in flow state, developers don't want to think about dropdown menus. This tool automates the decision-making process.
📖 Read the full source: r/ClaudeAI
👀 See Also

OpenClaw Smart Router Open-Sourced for Automatic Model Selection
A developer has open-sourced a Smart Router for OpenClaw that automatically classifies queries by complexity and routes them to optimal models, saving 60-80% on API costs compared to always using premium models like Claude or GPT-4o.

BotCost.dev: Free Analyzer to See How Much AI Bots Cost Your Site
BotCost.dev is a free tool that analyzes your server logs against 18 known AI bot fingerprints (GPTBot, ClaudeBot, Perplexity, etc.) and estimates monthly bandwidth cost — no upload required, runs in-browser.

Dual DGX Sparks vs Mac Studio M3 Ultra: Practical Comparison for Running Qwen3.5 397B Locally
A developer compared running Qwen3.5 397B locally on a $10K Mac Studio M3 Ultra 512GB and a $10K dual DGX Spark setup. The Mac Studio achieved 30-40 tok/s with 800 GB/s bandwidth but slow prefill, while the Sparks delivered 27-28 tok/s with faster compute but complex setup.

Freddy MCP Server Connects Wearables to AI Agents with Headless Sign-In
Freddy is a personal MCP server that links wearables (Polar, Oura, Withings, Suunto, Intervals.icu, Hevy, plus WHOOP, Strava, Dexcom in beta) to AI clients like Claude Code, ChatGPT, and Notion AI via OAuth. New headless sign-in enables scheduled workflows for autonomous agents.