Vektori's Memory Architecture: Principles from Claude's Leaked System

Memory Architecture Principles
The Claude Code team shared how their memory system works, revealing key principles: memory is an index, not storage. MEMORY.md contains just pointers (150 chars per line), with real knowledge in separate files fetched on demand. Raw transcripts are never loaded—only grepped when needed. Three layers exist, each with different access costs. The sharpest principle: if something is derivable, do not store it. Retrieval is skeptical—memory is a hint, not truth, and the model verifies before using.
Vektori's Implementation
Vektori applies the same principles with a different shape. While Claude uses a file hierarchy, Vektori implements a hierarchical sentence graph with three layers:
- FACT LAYER (L0) — Crisp statements serving as the search surface. Cheap and always queryable.
- EPISODE LAYER (L1) — Episodes across conversations, auto-discovered.
- SENTENCE LAYER (L2) — Raw conversation, only fetched when explicitly needed.
Same access model applies: L0 is your index, L2 is your transcript (grepped not dumped). You pay for what you need.
Strict Write Discipline
Nothing enters L0 without passing quality filters: minimum character count, content density check, pronoun ratio. If a sentence is too vague or purely filler, it never becomes a fact. This matches Claude's principle of not storing derivable things.
Retrieval Mechanics
Retrieval works as Claude describes: scored, thresholded, skeptical. Minimum score of 0.3 before anything surfaces. Results are ranked by vector similarity plus temporal decay, not retrieved blindly.
Architectural Divergence on Corrections
Claude's approach optimizes for single-user project contexts where the latest state matters. Vektori, designed for agents working across hundreds of sessions, preserves correction history. When a user changes their mind, the old fact stays in the graph with its sentence links, allowing tracing back to what was said before the change and why it got superseded.
Performance and Future
On LongMemEval-S, Vektori achieved 73% accuracy at L1 depth using BGE-M3 + Gemini Flash-2.5-lite. Multi-hop conflict resolution—where you reason about how a fact changed over time—is where triple-based systems (subject-object-predicate) collapse. The next layer involves storing why: causal edges between events ("user corrected X, agent updated Y, user disputed again") extracted asynchronously and queryable as a graph. Agent trajectories become memory—the agent's own behavior becomes part of what it can reason about.
📖 Read the full source: r/ClaudeAI
👀 See Also

PowerShell Script Automates OpenClaw Docker Setup on Windows
A PowerShell script handles Windows-specific networking quirks and Docker configuration for OpenClaw, automating checks, image retrieval, setup guidance, and container deployment.

MCP + Skills Framework: Guiding AI Agents for Efficient Data Science Workflows
A practical approach using MCP server + skills framework to constrain Claude/GPT agents toward platform-aware, efficient data science workflows — avoiding client-heavy code and unnecessary data movement.

Ouroboros Adds PM Interview Mode for Claude Code to Bridge Spec Gap
Ouroboros now includes a PM mode that runs a guided interview before handing off to Claude Code, asking questions like what problem is being solved, who it's for, and what constraints matter. The output is a PRD/PM document with goal, user stories, constraints, success criteria, assumptions, and deferred items.

ClawCodex /advisor Mode: Pair Cheap Worker with Expensive Reviewer to Cut Costs Without Losing Quality
Open-source Python coding agent ClawCodex adds an /advisor mode that pairs a cheap worker model (e.g., Haiku) with an expensive reviewer (e.g., Opus) at decision points, cutting costs several-fold without sacrificing architectural judgment.