Local MCP Memory System with Consolidation for AI Conversations

What This Is
A developer created a local memory system for AI conversations that consolidates and synthesizes information rather than just storing it. Built as an MCP server, it works with compatible clients like Claude Desktop and Claude Code, running 100% locally with no data leaving your hardware.
How It Works
The key differentiator from standard RAG systems is the consolidation process. Every 6 hours, a local LLM (Qwen 2.5-7B running in LM Studio) clusters recent memories by topic and consolidates them into structured knowledge documents. It extracts facts, solutions, and preferences, merging them with existing knowledge and versioning everything.
Technical Stack
- Embeddings: nomic-embed-text-v1.5 via LM Studio
- Vector search: FAISS (semantic + keyword hybrid)
- Consolidation LLM: Qwen 2.5-7B (Q4) via LM Studio
- Storage: SQLite for episodes, FAISS for vectors
- Protocol: MCP — works with anything that supports it
- Config: TOML
Features
- Semantic dedup with cosine similarity 0.95 threshold
- Adaptive surprise scoring — frequently accessed memories get boosted, stale ones decay
- Atomic writes with tempfile + os.replace for crash protection
- Tombstone-based FAISS deletion — O(1) instead of rebuilding the whole index
- Graceful degradation — if LM Studio goes down, storage still works, consolidation pauses
- 88 tests passing
MCP Tools
memory_store— save an episode with type, tags, surprise scorememory_recall— semantic search across episodes + consolidated knowledgememory_forget— mark an episode for removalmemory_correct— update a knowledge docmemory_export— full JSON backupmemory_status— health check
Why MCP Was Chosen
Models get replaced frequently, but accumulated knowledge shouldn't disappear with them. MCP makes the memory portable — one store, many interfaces. The memory layer becomes more valuable than any individual model.
Practical Results
After about a week of use, the system built knowledge documents about PC hardware, VR setup, coding preferences, and project architectures — all synthesized from normal conversation. When starting new chats, the AI already knows the user's context without re-explaining.
Requirements
- Python 3.11+
- LM Studio with Qwen 2.5-7B and nomic-embed-text-v1.5 loaded
- Any MCP client
📖 Read the full source: r/LocalLLaMA
👀 See Also

KubeShark: A Kubernetes Skill for Claude Code and Codex to Catch Hallucinated YAML
KubeShark is a failure-mode-first Kubernetes skill for Claude Code and Codex that catches deprecated APIs, misconfigured probes, broken selectors, and other common AI-generated mistakes before they hit production.

Fine-tuned Qwen3.5-2B with RAG-Engram architecture improves grounded answer accuracy from 50% to 93% at 8K context
A developer fine-tuned Qwen3.5-2B with a custom RAG-Engram architecture to address the 'lost in the middle' phenomenon, improving correct answers at 8K tokens from 50% to 93% on real-world queries. The system uses a two-level approach with static entity embeddings and dynamic chunk navigation.

Signet: Open-Source Memory Layer for AI Coding Agents Hits 80% F1 on LoCoMo
Signet is an open-source memory system for AI coding agents that achieves 80% F1 on the LoCoMo benchmark, compared to 41% for standard RAG. It extracts memories after each session and injects relevant context before prompts, running locally with SQLite.

WinRemote MCP: Open Source MCP Server for Full Control of Windows Desktops
WinRemote MCP provides AI agents with full control over Windows desktops, allowing for UI detection, file operations, registry access, and more, utilizing over 40 tools.