Practical experience replacing automation stack with MCP servers and local LLMs

✍️ OpenClawRadar📅 Published: March 1, 2026🔗 Source
Practical experience replacing automation stack with MCP servers and local LLMs
Ad

Setup and hardware

The developer runs a mix of Qwen 2.5 32B (quantized) and Llama 3.3 70B on a dual 3090 rig. Each automation task gets its own MCP server that exposes tools the model can call, functioning like an API that an LLM consumes instead of a human.

What works well

  • Code review automation: Pointing the model at a git diff via MCP tools catches real issues including logic bugs, missing error handling, and race conditions. Works about 70% as good as a senior dev review.
  • Log analysis and alerting: MCP server connects to ELK stack, with the model monitoring for anomaly patterns. It has caught 3 production issues before Grafana alerts fired. The key is giving enough context about what "normal" looks like for your system.
  • Documentation generation: Model reads the codebase through MCP file tools and generates/updates API docs, saving hours per week with genuinely good output quality.
Ad

What doesn't work (yet)

  • Multi-step reasoning chains: Anything requiring more than 3-4 tool calls in sequence starts to go off the rails as the model loses context of the original goal. Smaller context windows make this worse. Chain-of-thought prompting helps but doesn't solve it.
  • Real-time decision making: Latency on 70B models means this can't be used for time-sensitive tasks. Code review pipeline takes 2-3 minutes per PR, making it fine for async workflows but useless for real-time applications.
  • Creative problem solving: Local models struggle with tasks requiring approaches not well-represented in training data. API models (Claude, GPT-4) are noticeably better here.

Key architectural lessons

  • Keep MCP servers stateless. Let the model manage state through tool calls, not server-side session.
  • Build retry logic into your MCP client, not the server. Models will make malformed tool calls approximately 5% of the time.
  • Log every tool call and response for debugging when the model does something unexpected.
  • Use structured output (JSON mode) for anything downstream systems consume. Free-form text output is a debugging nightmare.

📖 Read the full source: r/LocalLLaMA

Ad

👀 See Also