Practical experience replacing automation stack with MCP servers and local LLMs

Setup and hardware
The developer runs a mix of Qwen 2.5 32B (quantized) and Llama 3.3 70B on a dual 3090 rig. Each automation task gets its own MCP server that exposes tools the model can call, functioning like an API that an LLM consumes instead of a human.
What works well
- Code review automation: Pointing the model at a git diff via MCP tools catches real issues including logic bugs, missing error handling, and race conditions. Works about 70% as good as a senior dev review.
- Log analysis and alerting: MCP server connects to ELK stack, with the model monitoring for anomaly patterns. It has caught 3 production issues before Grafana alerts fired. The key is giving enough context about what "normal" looks like for your system.
- Documentation generation: Model reads the codebase through MCP file tools and generates/updates API docs, saving hours per week with genuinely good output quality.
What doesn't work (yet)
- Multi-step reasoning chains: Anything requiring more than 3-4 tool calls in sequence starts to go off the rails as the model loses context of the original goal. Smaller context windows make this worse. Chain-of-thought prompting helps but doesn't solve it.
- Real-time decision making: Latency on 70B models means this can't be used for time-sensitive tasks. Code review pipeline takes 2-3 minutes per PR, making it fine for async workflows but useless for real-time applications.
- Creative problem solving: Local models struggle with tasks requiring approaches not well-represented in training data. API models (Claude, GPT-4) are noticeably better here.
Key architectural lessons
- Keep MCP servers stateless. Let the model manage state through tool calls, not server-side session.
- Build retry logic into your MCP client, not the server. Models will make malformed tool calls approximately 5% of the time.
- Log every tool call and response for debugging when the model does something unexpected.
- Use structured output (JSON mode) for anything downstream systems consume. Free-form text output is a debugging nightmare.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Onboarding an AI Agent as a Team Member: A Real Business Case
A business shares their experience onboarding their first AI agent as an actual team member handling design, code, marketing, and operations, noting that the hard parts weren't the technical setup.

OpenClaw as a Process Replication Engine: Multi-Agent Workflows for Automated Development
A developer found OpenClaw more effective as a 'process replication engine' than a personal assistant, building multi-agent workflows that automate complex development pipelines from idea to deployment for around $80/month.

Developer's AI Productivity Trap: From 80 Commits/Month to 1,400+ with 17 Agents
A developer reports that AI coding agents didn't replace their job but multiplied their workload, going from 80 commits/month on one CRM project to managing 17 AI agents, 12 parallel projects, and 1,400+ commits across 39 repositories.

LLM-Assisted Decompilation: Evolving Strategies and Tools
LLM-assisted decompilation, leveraging Claude, progressed from 25% to 75% on Snowboard Kids 2 using strategic function prioritization and similarity computation.