Using a Local LLM as a Claude Code Subagent to Reduce Context Usage

✍️ OpenClawRadar📅 Published: March 2, 2026🔗 Source
Using a Local LLM as a Claude Code Subagent to Reduce Context Usage
Ad

A developer on r/LocalLLaMA demonstrates how to use Claude Code to delegate tasks to a local LLM running via LM Studio, reducing Claude's context usage by keeping file content local.

How It Works

The system uses a small Python script (~120 lines, standard library only) that runs an agent loop:

  • You pass Claude a task description without file content
  • The script sends it to LM Studio's /v1/chat/completions endpoint with read_file and list_dir tool definitions
  • The local model calls those tools itself to read the files it needs
  • The loop continues until it produces a final answer
  • Claude sees only the result, not the file content

Example Usage

python3 agent_lm.py --dir /path/to/project "summarize solar-system.html"
# [turn 1] → read_file({'path': 'solar-system.html'})
# [turn 2] → This HTML file creates an interactive animated solar system...

The file content goes into the local model's context (tested with Qwen3.5 35B 4-bit via MLX on Apple Silicon), not Claude's.

What It's Good For

  • Code summarization and explanation
  • Bug finding
  • Boilerplate / first-draft generation
  • Text transformation and translation (tested with Hebrew)
  • Logic tasks and reasoning (use --think flag for harder problems)
Ad

What It's Not Good For

  • Tasks that require Claude's full context, such as multi-file understanding where relationships matter
  • Tasks needing the current conversation history
  • Anything where accuracy is critical

The author describes it as "a Haiku-tier assistant, not a replacement."

Setup

  • LM Studio running locally with the API server enabled
  • One Python script for the agent loop, one for simple prompt-only queries
  • Both wired into a global ~/.claude/CLAUDE.md so Claude Code knows to offer delegation when relevant
  • No MCP server, no pip dependencies, no plugin infrastructure needed
  • Recommendation: Add {%- set enable_thinking = false %} to the top of the jinja template - for most tasks this saves time and tokens without quality degradation

The author notes they had Claude help write the post but with supervision and corrections, and is happy to share the scripts if there's interest.

📖 Read the full source: r/LocalLLaMA

Ad

👀 See Also

Extracting OpenClaw Components: A Developer's Experience with Lane Queue and Memory System
Tools

Extracting OpenClaw Components: A Developer's Experience with Lane Queue and Memory System

A developer attempted to extract specific components from OpenClaw for use in personal AI agents, testing the Lane Queue task execution system and examining the memsearch memory system. The Lane Queue was successfully reimplemented in Python using documentation, revealing gaps in documentation and 13 implementation issues.

OpenClawRadar
Local MCP Memory System with Consolidation for AI Conversations
Tools

Local MCP Memory System with Consolidation for AI Conversations

A developer built an MCP server that provides persistent local memory for AI clients, using Qwen 2.5-7B to consolidate conversations into structured knowledge documents every 6 hours. The system runs entirely on your hardware with semantic dedup, adaptive scoring, and FAISS vector search.

OpenClawRadar
Open-source AI job search system built with Claude Code evaluates offers, generates tailored resumes
Tools

Open-source AI job search system built with Claude Code evaluates offers, generates tailored resumes

A developer open-sourced a Claude Code project that turns your terminal into a job search command center. The system evaluates job offers across 10 dimensions, generates ATS-optimized PDF resumes, scans 45+ company career pages, and includes 14 skill modes.

OpenClawRadar
Pu.sh: 400-Line Shell Script Coding-Agent Harness from HN
Tools

Pu.sh: 400-Line Shell Script Coding-Agent Harness from HN

Pu.sh is a portable coding-agent harness in 400 lines of shell (sh, curl, awk), supporting Anthropic + OpenAI, 7 tools, REPL, checkpoint/resume, and pipe mode — with 90 no-API tests.

OpenClawRadar