Local MCP Memory System with Consolidation for AI Conversations

✍️ OpenClawRadar📅 Published: February 26, 2026🔗 Source
Local MCP Memory System with Consolidation for AI Conversations
Ad

What This Is

A developer created a local memory system for AI conversations that consolidates and synthesizes information rather than just storing it. Built as an MCP server, it works with compatible clients like Claude Desktop and Claude Code, running 100% locally with no data leaving your hardware.

How It Works

The key differentiator from standard RAG systems is the consolidation process. Every 6 hours, a local LLM (Qwen 2.5-7B running in LM Studio) clusters recent memories by topic and consolidates them into structured knowledge documents. It extracts facts, solutions, and preferences, merging them with existing knowledge and versioning everything.

Technical Stack

  • Embeddings: nomic-embed-text-v1.5 via LM Studio
  • Vector search: FAISS (semantic + keyword hybrid)
  • Consolidation LLM: Qwen 2.5-7B (Q4) via LM Studio
  • Storage: SQLite for episodes, FAISS for vectors
  • Protocol: MCP — works with anything that supports it
  • Config: TOML

Features

  • Semantic dedup with cosine similarity 0.95 threshold
  • Adaptive surprise scoring — frequently accessed memories get boosted, stale ones decay
  • Atomic writes with tempfile + os.replace for crash protection
  • Tombstone-based FAISS deletion — O(1) instead of rebuilding the whole index
  • Graceful degradation — if LM Studio goes down, storage still works, consolidation pauses
  • 88 tests passing
Ad

MCP Tools

  • memory_store — save an episode with type, tags, surprise score
  • memory_recall — semantic search across episodes + consolidated knowledge
  • memory_forget — mark an episode for removal
  • memory_correct — update a knowledge doc
  • memory_export — full JSON backup
  • memory_status — health check

Why MCP Was Chosen

Models get replaced frequently, but accumulated knowledge shouldn't disappear with them. MCP makes the memory portable — one store, many interfaces. The memory layer becomes more valuable than any individual model.

Practical Results

After about a week of use, the system built knowledge documents about PC hardware, VR setup, coding preferences, and project architectures — all synthesized from normal conversation. When starting new chats, the AI already knows the user's context without re-explaining.

Requirements

  • Python 3.11+
  • LM Studio with Qwen 2.5-7B and nomic-embed-text-v1.5 loaded
  • Any MCP client

📖 Read the full source: r/LocalLLaMA

Ad

👀 See Also