Reflect MCP Server Implements Reflexion Paper for Persistent Coding Agent Memory

✍️ OpenClawRadar📅 Published: April 16, 2026🔗 Source

A developer has implemented the Reflexion paper (Shinn et al., NeurIPS 2023) as an MCP server to address a common problem with local coding agents: lack of persistent memory between sessions. The tool, called reflect-mcp, allows agents to remember and avoid repeating mistakes.

How It Works

The system operates through a structured workflow:

After every test failure, the agent critiques its own work and extracts patterns from the error
These lessons are stored for future reference
Before starting new tasks, the agent recalls past lessons using full-text search
The pattern matching is fully regex-based - no LLM calls are needed for classification

The developer notes that error messages are predictable enough for deterministic matching to work effectively. The agent writes the critique since it has the context, while the server handles structuring and deduplication of the lessons.

Technical Implementation

Built as an MCP (Model Context Protocol) server
Uses SQLite with FTS5 for storage and search
Works with any MCP-compatible client
Install via: cargo install reflect-mcp

Results After One Week

The developer reported several improvements in their coding agent's behavior:

Stopped doing the same unwrap() on user input
Stopped forgetting timezone handling
Started avoiding previously seen failure patterns automatically
Pattern tracking made recurring mistakes across the project visible

The project is available on GitHub at https://github.com/rohansx/reflect. The developer is seeking feedback from others who have experimented with persistent memory setups for local coding agents.

📖 Read the full source: r/LocalLLaMA

👀 See Also

Tools

Parallel Sub-Agents in Claude Code: When They Save vs. Burn Tokens

Anthropic reports multi-agent systems use ~15× more tokens than a single chat, but prompt caching offers 90% discount on tokens. Whether sub-agents save or burn money depends on cache hit rates.

May 15, 2026, 10:19 AM UTC

OpenClawRadar

Tools

Nelson v2.2.3 Released: Multi-Agent Coordination for Claude Code, Plus a Discrete-Event Simulation Benchmark

Nelson v2.2.3 ships a multi-agent coordination skill for Claude Code using a naval metaphor. A 13-configuration benchmark shows opus-4-7 with thinking dominates; skill choice is a smaller delta.

May 9, 2026, 12:21 AM UTC

OpenClawRadar

Tools

APEX Testing Benchmark Results: Qwen 3.5 Performance on Real Coding Tasks

APEX Testing benchmark results show Qwen 3.5 models' performance on 70 real GitHub coding tasks, with the 397B version dropping to 1194 ELO on master-level tasks while GLM-4.7 quantized leads local models at 1572 ELO.

Feb 26, 2026, 05:45 AM UTC

OpenClawRadar

Tools

ComfyUI Skill Enables AI Agents to Queue and Batch Image Renders via Natural Language

A new open-source skill allows OpenClaw agents to construct ComfyUI workflows, submit jobs, and manage renders through natural language commands like 'Make 50 variations of this concept with different seeds' or 'Compare these 4 prompts side by side at 1024x1024'.

Apr 13, 2026, 07:45 AM UTC

OpenClawRadar