Giving Claude a Local LLM as an Assistant via MCP on Mac

✍️ OpenClawRadar📅 Published: May 12, 2026🔗 Source
Giving Claude a Local LLM as an Assistant via MCP on Mac
Ad

A Reddit user detailed how they gave Claude access to a local LLM running on a Mac Mini M4 (24GB RAM) via an MCP connection to Ollama. The setup uses Ollama serving Qwen 2.5 Coder (14B) as an assistant named 'Frank', which Claude can delegate tasks to under specific rules — must use fewer tokens than Claude itself, must not affect quality, and requires a final review.

Setup Details

  • Hardware: Mac Mini M4 with 24GB RAM.
  • Local LLM: Qwen 2.5 Coder (14B) running via Ollama (also tested with LM Studio).
  • Connection: MCP (Model Context Protocol) to link Claude (CLI or Desktop App) with the local model.
  • Instructions: Claude was given a memory Markdown file (memory.md) with guidelines for when and how to use Frank — e.g., delegate text processing, large CSS/HTML file handling, and use only when it saves tokens without degrading output quality.
Ad

Practical Use Cases

  • Text processing and transformation — offloaded to Frank to reduce Claude's token usage.
  • Handling large CSS/HTML files that would be expensive for Claude to process directly.
  • Running performance, coding, and logic tests — Claude evaluated local models via Frank rather than manually.

The user noted they are operating at the limits of their RAM/GPU and cannot test larger models (30B+). They invited others with more powerful hardware to try similar setups and share results.

This approach effectively creates a cost-free assistant for Claude, offloading token-heavy tasks while maintaining quality through Claude's final review.

📖 Read the full source: r/ClaudeAI

Ad

👀 See Also