Pi Coding Agent with Qwen 35B Q2: Using Filesystem as External Memory and Enforcing Context Guards

A Reddit user shared their approach to agentic coding with local LLMs, built on Pi coding agent with Qwen 35B (Q2_K_XL quant via LM Studio). The core insight: treat the LLM as a logic processor, not a context database. The implementation enforces strict guards at the API boundary — the model cannot bypass them.
Key constraints enforced by the system
- Write/edit limit: Rejects any output over 100 lines. Model must write a skeleton first, then fill in one section at a time. If it tries to dump a full file, the call is blocked with instructions to split the work.
- Thinking block cap: If the model's reasoning exceeds 2000 chars, it receives a correction to write conclusions to disk and move on.
- Context monitor: At 65% context usage, the model is told to write its state to files. At 80%, everything stops — the model writes its 'brain' to disk while still coherent.
- Persistent output: If the model gives a long answer without writing a file, it's instructed to save findings to a step file. Nothing stays only in context.
External brain structure
The system uses .think/ and .plan/ directories as the model's external memory. Every step, decision, and finding is written to a file. When context compresses, the model reads its own notes back. The session purpose is saved separately to _purpose.md and re-injected after context compression, preserving the original goal.
Knowledge distillation
A /distill command crawls a codebase, builds an import graph, topologically sorts files, and has the model summarize them one per turn into a knowledge base. The manifest is split into pages of 50 files to avoid consuming the whole context. Users can drop files like svelte5-gotchas.md or astro-gotchas.md into a knowledge folder; an isolated LLM call selects which ones are relevant to the current task, and only the content gets injected into the main conversation.
Real-world result
The user asked the model to build a Three.js plane flying game. The first attempt tried to write 652 lines in one call — the guard rejected it. The model replanned, wrote a skeleton, then filled in features one edit at a time. The final result was a working game with 3D plane model, obstacles, HUD, minimap, and start/game over screens — all at Q2 quant.
The full setup runs at Q2_K_XL quantization as the floor; the user notes Q4 or Q8 should yield better results. The code is available on GitHub: github.com/Kodrack/Pi-forge.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Claude Desktop + Blender via MCP: Real-Time 3D Workflow Closes the Feedback Loop
An open-source Blender add-on runs an MCP server inside Blender, letting Claude Desktop inspect scenes, create objects, render images, and read results—closing the script-paste feedback loop.

TradesMCP: Open Source MCP Server for Contractor License Verification and Construction Data
TradesMCP is an open source Model Context Protocol server that provides Claude with access to real contractor license data, building permits, material pricing, and labor rates. The tool correctly verified an active California contractor license where ChatGPT returned incorrect information.

Foreman: Open Source Slack Bot for Remote Control of Local Claude Code
Foreman is a free, open source Slack bot that provides remote control for locally running Claude Code instances. It allows developers to send tasks to Claude from their phone while maintaining full local access to filesystem, tools, and environment.

mcp-india-stack: Open-source MCP server for Indian financial APIs
mcp-india-stack is an open-source MCP server that provides Claude with native access to seven Indian financial and government API tools, including GSTIN validation, IFSC lookup, and PAN validation. It requires zero authentication, is offline-first, and is available via pip install.