Persistent Memory for Claude: Local Stack with MCP, 39ms Retrieval, 82% Token Reduction

A Reddit user built a local persistent memory layer for Claude that solves the zero-context problem between sessions. The stack runs entirely locally (no cloud, no API keys) and integrates via MCP. Key architecture: four layers (L0 append-only event log in SQLite, L1 structured facts deferred, L2/L3 wiki prose, L4 crystallized session nodes with summary + decisions + open threads), Qdrant Docker for vector search, llama.cpp with Qwen3-Embedding-4B on GPU and Qwen3.5-2B-Q4_K_M on CPU for embedding and chat, and a FastMCP server exposing 7 tools (retrieve, crystallize_session, list_sessions, get_l4_node, index_status, reindex, shutdown_models).
Numbers
- Token reduction vs grep+Read baseline: 82.7% mean, 86.2% median.
- Retrieval F1: 0.50 vs 0.20 baseline.
- Embed cold start ~4s; hot-path p95 39ms (was 2241ms before bug fix).
- L4 session retrieval eval: 0.920 mean score (gate 0.6).
- 738 chunks indexed across 104 markdown files.
Key Learned: Connection Reuse on Windows
The hot-path retrieve was stuck at 2241ms p95 even with GPU-resident embedding on a 4070 Ti Super. The cause: every httpx.post() opened a fresh TCP connection, and Windows localhost handshakes took ~2 seconds. Switching to a persistent httpx.Client with keep-alive dropped p95 to 39ms — a 57× speedup.
Other Surprises
- Qwen3 thinking mode: If
enable_thinkingis not disabled viachat_template_kwargs: {enable_thinking: false}with--jinjaon llama-server, the model spends all token budget on thinking blocks and outputs empty content. - MCP registration: Claude Desktop's agentic mode (Cowork) reads a plugin config file, not
~/.claude.json. The LKS service must be packaged as a proper Cowork .plugin bundle.
Who It's For
Developers who use Claude heavily and want a cost-effective, private, local memory layer that maintains context across sessions without cloud dependencies.
📖 Read the full source: r/ClaudeAI
👀 See Also

VibeSmith: Local Tool for Detecting Skill Conflicts in Claude Code Projects
VibeSmith is a local macOS desktop app that provides unified visibility across Claude Code projects, detecting conflicts when global and project-level components share names, visualizing dependencies as DAGs, and tracking context token usage.

PowerShell Script Automates OpenClaw Docker Setup on Windows
A PowerShell script handles Windows-specific networking quirks and Docker configuration for OpenClaw, automating checks, image retrieval, setup guidance, and container deployment.

Codex Chrome Extension Adds Background Browser Automation Across Tabs
Codex's new Chrome extension on macOS/Windows enables parallel browser task execution in background tabs without taking over the browser — covering debugging flows, dashboards, research, and CRM updates.

RAG Learning Academy Built Inside Claude Code with 20 Specialist Agents
A developer created an interactive RAG learning academy inside Claude Code featuring 20 specialist agents, 17 slash commands, and a 9-module curriculum that assesses knowledge level and uses open-source tools by default.