LLM Cost Profiler: Open-source tool tracks API spending to make case for local models

LLM Cost Profiler is an open-source Python tool that tracks every API call your code makes to OpenAI and Anthropic, showing exactly what you're spending, where, and why. The tool exposes which tasks are overpriced relative to their complexity, providing concrete data to make the case for local inference.
Key Features and Findings
The tool stores everything in local SQLite and is MIT licensed. According to the source, it found several specific examples of API call waste:
- A classifier using GPT-4o that outputs one of 5 labels — a task any decent 7B local model handles easily. Cost: ~$89/week on API calls.
- Thousands of duplicate calls to the same prompt — zero caching. Local inference with caching would make this effectively free.
- A summarizer where 34% of calls were retries from format errors. A well-tuned local model with constrained generation eliminates this entire class of waste.
The author notes this tool gives teams concrete ammunition for investing in local inference infrastructure: "Here's the exact dollar amount we'd save by moving X task to a local model."
The tool is available on GitHub at https://github.com/BuildWithAbid/llm-cost-profiler. The author is planning to add support for tracking local model inference costs too (compute time based costing) and asked the community if this would be useful.
This type of cost profiling tool is particularly relevant for developers using AI coding agents, as it provides data-driven insights into where API spending might be inefficient compared to local alternatives.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Clawion: OpenClaw wrapper with Claude Max support and GitHub integration
Clawion is an OpenClaw wrapper that supports Claude Max without requiring an API key. Setup involves picking a template, connecting Telegram, and deploying a code companion with GitHub integration for automated PR creation.

Auto-Fix System Uses Claude Code Headless to Detect and Fix Production Errors
A developer built an automated production error-fixing system using Claude Code CLI in headless mode. The system detects errors from logs, creates isolated git worktrees for each issue, prompts Claude to write fixes, and requires manual approval via Telegram before creating PRs.

Comparison of RunLobster vs Hosted OpenClaw Solutions
A developer tested RunLobster against KiwiClaw, xCloud, and self-hosted OpenClaw for 2 weeks each. RunLobster differs fundamentally as a product rather than just hosting, with 3,000 one-click integrations and memory that builds over time.

Claudetop: Real-Time Cost Monitoring for Claude Code Sessions
Claudetop is an htop-like tool that shows real-time spending, cache efficiency, and model comparisons for Claude Code sessions. It provides slash commands like /claudetop:stats and smart alerts for cost milestones and efficiency issues.