AgentWorkingMemory: A Local Memory System for AI Coding Agents

What AgentWorkingMemory Solves
AI coding agents like Claude Code lack persistent memory between sessions. Developers end up re-explaining architecture, database schemas, and previous decisions every time they start a new session, wasting time and context window space. While Claude Code offers some tools like --continue to resume conversations, auto-memory that saves notes to markdown files, and CLAUDE.md project documentation, these have limitations:
--continueor--resumerestores entire chat threads but consumes context window space and only works with one thread at a time- Auto-memory loads the first 200 lines of
MEMORY.mdbut lacks retrieval intelligence—it doesn't know which notes are relevant to current work - Project docs like
CLAUDE.mdwork for stable information but go stale quickly in fast-evolving projects
AgentWorkingMemory addresses these issues by accumulating knowledge across sessions, surfacing relevant context for current work, and improving over time without manual management.
How It Works
AWM runs entirely locally on your machine with no cloud dependencies. The system consists of:
- A SQLite database for storage
- Three local ML models (~124MB total, downloaded once and cached)
- A Node.js process
There's no server to run, no Docker container, and no background daemon. When you start Claude Code, it automatically spins up AWM through MCP (Model Context Protocol). When you close the session, it stops. Everything stays local—no cloud, no API keys, no data leaving your machine. For extra security, AWM supports bearer token authentication to lock down access to the memory API.
Setup and Usage
Installation requires two commands:
npm install -g agent-working-memory
awm setup --globalAfter restarting Claude Code, 14 memory tools appear automatically. The first session takes about 30 seconds while the ML models download (~124MB, cached after that). From that point on:
- The agent writes memories when it learns something important
- It recalls relevant memories when starting new work
- It checkpoints its state to recover after interruptions
The system was developed while rebuilding a 20-year-old codebase (~1.4 million lines) into a modern stack (~250K lines estimated), specifically for a membership management platform with 88 database tables and multi-sprint development using multiple AI agents in parallel.
📖 Read the full source: r/ClaudeAI
👀 See Also

Local voice-to-text transcription for OpenClaw using Parakeet TDT 0.6b v3
A developer has converted NVIDIA's Parakeet TDT 0.6b v3 model to run locally via ONNX on CPU, supporting 25 European languages. The model provides an OpenAI-compatible API endpoint through a Docker container, allowing integration with OpenClaw for audio file transcription.

Claude Desktop App Adds Projects Feature to Cowork Interface
The Claude desktop app now includes a Projects feature in Cowork, allowing users to organize tasks and context in dedicated workspaces. Files and instructions remain on the user's local computer, with options to import existing projects or start new ones.

GLM-5-Turbo Shows Low Tool Call Error Rate in User Testing
The z-ai/glm-5-turbo model demonstrates a 0.57% average tool call error rate in testing, significantly lower than GLM-5's ~3% rate. A user reported successfully using it with a CLI tool to write a 97,000-word fantasy novel with minimal issues.

TOON MCP server reduces tool result tokens by 30-60% in OpenClaw
An MCP server that compresses structured JSON tool results into the TOON format can cut token usage by 30-60% for tabular data like database queries and API responses, helping delay context window compaction in OpenClaw sessions.