LLM Council Analysis Reveals Practical Claude Code Token Optimization Strategies

Problem and Experiment Setup
A developer experiencing daily Claude Code usage limits conducted an experiment using LLM Council (https://github.com/karpathy/llm-council). The setup involved 5 different personas that were forced to critique, challenge, and refine solutions, followed by a peer review round.
Key Findings
The analysis revealed that the biggest token drain wasn't complexity, but using "thinking mode" by default. This alone was burning tokens almost like Opus.
Practical Optimization Habits
- Turn OFF extended thinking by default
- /clear after every git commit (non-negotiable)
- Stop writing "yes / continue" prompts
- /compact every ~40 messages
- Keep CLAUDE.md lean or you pay tax every session
Mental Shift and Results
The core insight: Stop treating intelligence as default. Treat it like a resource you deploy intentionally. This shift enables:
- 30-50% token savings instantly
- Ability to actually use Opus without fear
- Predictable daily workflow instead of random limit hits
The council emphasized one rule: If you don't track /cost, you're not optimizing... you're guessing.
Outcome
With the full playbook implementation:
- ~60-70% reduction in token usage
- Same or better output quality
- Opus becomes usable for high value work
The developer noted this approach was more effective than any single prompt hack.
📖 Read the full source: r/ClaudeAI
👀 See Also

Bespoke AI v0.8.1: VS Code Autocomplete Extension for Code and Text
Bespoke AI v0.8.1 is a VS Code extension providing autocomplete for both code and text, leveraging Claude Code subscriptions via Anthropic's Agent SDK to avoid API charges while supporting multiple backends including Ollama.

Gemma4 26B-A4B Delivers Fast Local Performance with Web Search and Image Support
The gemma-4-26B-A4B model achieves approximately 145 tokens per second on an RTX 4090 and includes web search MCP and image support for chat applications. A blog post details setup and cross-platform usage on Mac and iPhone.

OpenClaw User Critiques Tool's Architecture and Safety Gaps
A Reddit user describes OpenClaw as the only tool making agent automation this accessible but criticizes its architecture for lacking a control layer for file operations, a protected kernel, proper context management, and built-in versioning or tests.

LLM Cost Profiler: Open-source tool tracks API spending to make case for local models
LLM Cost Profiler is a Python tool that tracks every API call to OpenAI/Anthropic, showing exactly what you're spending and where. It exposes tasks that are overpriced relative to their complexity, providing concrete dollar amounts to justify moving to local models.