Hybrid AI Coding Workflow: Claude Planning, Local Execution

Hybrid AI coding workflow reduces cloud costs

A developer on r/LocalLLaMA shared a detailed workflow that combines cloud and local AI models to reduce token costs while maintaining coding quality. The approach addresses the realization that many coding tasks don't require expensive cloud models.

The workflow architecture

The system follows a "Reason in the cloud, Execute locally" logic:

Planner (Claude 3.5 Sonnet): Receives the task and generates a precise task_context.md file containing instructions, file paths, and logic. This costs approximately 300-500 tokens.
Coder (Local Qwen2.5-Coder 30B via Ollama): Takes the specification and actual file content to write the code. This runs locally with zero cost.
Validator: A simple Bash script runs tsc --noEmit or mypy for type checking.
Reviewer (Local Qwen2.5-Coder 7B): Runs in parallel to check for obvious logic flaws.
Auto-fix: If the build fails, the error log goes back to the local coder for 2-3 iterations.

Implementation details

The entire pipeline is wrapped into a set of Bash scripts using just jq and curl to communicate with the Ollama API. The system auto-detects language standards (TypeScript, Python, C++, etc.) based on the planner's output and doesn't require heavy Python/Node runtimes.

The developer notes that local models (even 30B ones) often fail at complex architectural reasoning but are surprisingly good at execution when given crystal-clear specifications.

Results and savings

On a recent TypeScript project involving 12 files changed:

Claude usage was limited to the initial planning phase only
Local models handled everything else: writing 12 files, linting, and reviewing
Total savings: approximately 85% token reduction compared to doing everything inside the Claude Code CLI

The developer has made the scripts available in a repository called ai-orchestrator on GitHub (username: Mybono) for those interested in implementation details.

📖 Read the full source: r/LocalLLaMA