mnemos: Go-Based Persistent Memory for AI Coding Agents MCP

mnemos is a persistent memory layer for AI coding agents, built as a single static Go binary (~15 MB) with no Python, no Docker, and no CGO. It uses pure Go SQLite via modernc.org/sqlite and provides hybrid retrieval (BM25 + vectors via RRF) with optional Ollama for embeddings. It's MCP-native, running against Claude Code, Cursor, Windsurf, and Codex CLI.

Verifier and Benchmarks

The author built a verifier that runs the same agent twice (with and without mnemos) under the same prompt and model, to measure concrete lift. Three verification modes ship in the binary:

mnemos verify retrieval – checks if the right memory surfaces for its trigger query
mnemos verify behavior – runs Claude with mnemos on vs off, counts how often transcript matches an assertion
mnemos verify capture – checks whether the agent records corrections handed to it during a task

Read-side results (n=5 paired runs on Claude Code):

session_start_on_edit: 5/5 with, 0/5 without (+100%)
oss_first_for_protocol: 5/5 with, 0/5 without (+100%)
no_ai_attribution_in_commit: 5/5 vs 5/5 (no lift)
no_cgo_proposal: 5/5 vs 5/5 (no lift)
migration_locked_refused: 5/5 vs 5/5 (no lift)

Aggregate +40%. Memory wins where the model's prior is wrong or absent (contrarian conventions, recursive tool memory). On widely-known best practices, no lift, but no degradation either.

Write-Side Capture

Initial baseline: agents recorded only 7% of corrections handed to them during a task. "Save this for future sessions" got skipped 3/3 times. After two rounds of fixes, capture reached 53%.

Round 1 (tool description tweaks): Added trigger phrase examples like "we tried X" or "going forward use Y". Moved from 7% to 13% (noise).
Round 2 (structural fix): Added a UserPromptSubmit hook that pattern-matches correction-shaped phrasing and emits a directive block into the prompt context. Agent still owns the structured tool call, but the trigger is non-skippable. Moved from 13% to 53%.

The remaining failure pattern: architectural decisions buried in larger task prompts still sit at 0/3 even with the directive. The stronger task framing seems to override it.

Tech Specs

Single static Go binary, ~15 MB
Pure Go SQLite via modernc.org/sqlite
Hybrid retrieval: BM25 + vectors via RRF, auto-detects Ollama, works fine without it
MCP-native: runs against Claude Code, Cursor, Windsurf, Codex CLI
Bi-temporal store, prompt injection scanner at write boundary, deterministic correction-to-skill promotion (no LLM in consolidation loop)
Local-first: nothing leaves your machine unless you explicitly point it at OpenAI for embeddings

Verifier Harness

The verifier lives in verify/ in the repo. Fixtures are YAML and scenarios are easy to add. The author notes n=5 is small and is working on a tau-bench pass@k benchmark next.

Repo: https://github.com/polyxmedia/mnemos

📖 Read the full source: r/LocalLLaMA