Approach to Self-Improving Memory in Local AI Agents

Memory Architecture for Persistent AI Agents
A developer on r/LocalLLaMA has shared their approach to creating AI agents that don't repeat mistakes across sessions. The core problem addressed is that every session starts from zero, with context windows resetting and corrections being lost between sessions.
Memory Implementation
The system uses markdown as the source of truth instead of a database. MEMORY.md is human-editable - delete a line in vim and the agent forgets it. SQLite and FAISS (HNSW, 768-dim) are derived caches that are rebuildable from markdown anytime. This allows users to version-control their agent's memory with git.
Episode Scoring and Rule Learning
Each execution gets scored +1/-1 and saved as an episode. On similar future tasks, relevant episodes get pulled into context. When the same error signature (SHA256 of tool name + normalized error) shows up twice within 7 days, a rule learner generates a one-line prevention rule.
Rules start at 0.40 confidence and need 0.60 to actually get injected into future prompts. Success bumps confidence +0.03, failure drops it -0.05. Rules that don't help eventually decay away.
Trust Escalation System
Instead of configuring permission levels upfront, the agent tracks approval patterns. 5 approvals at 90%+ rate = auto-promote. One revert = demote back. There's a shadow mode for auditing.
Task Decomposition and Safety
Complex goals become a DAG (Directed Acyclic Graph). Circular dependencies are caught via topological sort, failure cascades to dependents via DFS (Depth-First Search). A completion gate checks 18 requirements (R01-R18) - did the agent actually read files, write changes, verify results, stay in the workspace?
Safety features include 43 bash risk patterns, dual-pass analysis (raw + decoded), fail-closed design (Guardian crash = deny), and minimum writable depth of 3 to prevent rm -rf /.
The developer is seeking feedback on whether the confidence decay on rules feels right and whether the +0.03/-0.05 asymmetry is optimal. They're also wondering if there are better alternatives to HNSW for this scale (typically <10k episodes).
📖 Read the full source: r/LocalLLaMA
👀 See Also

Open-Source Tool Measures AI Coding Agent Autonomy with Local Data Analysis
Codelens-AI is an open-source CLI tool that analyzes Claude Code session files alongside git history to calculate autonomy metrics like Autopilot Ratio and Self-Heal Score. The tool runs locally with zero setup using npx claude-roi and keeps all data on your machine.

Using Claude Code to revive abandoned personal projects: a practical walkthrough
Matthew Brunelle shares how he used Claude Code (with Opus 4.6) to resurrect a stalled YouTube Music–to–OpenSubsonic API shim project, complete with setup steps, prompts, and workflow tips.

Toothcomb: Open-Source Real-Time Speech Fact-Checker Built with Claude Opus and Sonnet APIs
Toothcomb is an open-source tool that takes a speech transcript, fact-checks claims, detects logical fallacies and manipulative language using Claude Opus API, and supports real-time microphone streaming.

Jork Agentic Framework Built with Claude Ranks Top 10 in $4M Hackathon
A developer built an agentic framework called Jork using Claude and GLM models that ranked Top 10 among 2000+ applications in a $4 million hackathon. The framework autonomously developed tools including a Solana launchpad radar and a working word search game.