Claude Code's File-Based Memory System: A Pragmatic Alternative to Vector DBs

Claude Code uses a file-based approach for agent memory that replaces the typical vector database and embeddings setup. Instead of full RAG, it stores memories as .md files with small frontmatter sections containing name, description, and type information, plus a MEMORY.md file that acts as an index.
How the System Works
At runtime, the system doesn't embed or search everything. It follows this process:
- Scans memory files (capped at approximately 200, newest first)
- Reads just the first ~30 lines (primarily metadata)
- Builds a lightweight manifest
- Uses a small model to pick the top ~5 relevant memories
- Loads only those selected memories into context (with size limits)
Key Advantages
The design offers several practical benefits:
- Cost-effective: Bounded files, bounded tokens, predictable costs
- Fast: No embedding or similarity search operations
- Controlled: Only injects a few memories with hard caps everywhere
- Human-readable: Everything is stored as markdown files
- Less garbage: Explicitly avoids storing information that can already be derived from the repository
The system treats memory as "maybe stale" rather than absolute truth, which provides a refreshing approach to agent memory management. This design is particularly pragmatic for coding and debugging agents where most "memory" consists of preferences, context, or external references rather than large knowledge bases.
While this approach doesn't replace RAG for all use cases, it represents a solid tradeoff for development agents where simplicity and predictability matter more than comprehensive knowledge retrieval.
📖 Read the full source: r/ClaudeAI
👀 See Also

Claude Code Karma: Local Observability Dashboard for Claude Code Sessions
Claude Code Karma is an open-source local dashboard that parses JSONL files from ~/.claude/ to visualize Claude Code session data, track tool usage, and monitor silent failures. Built with FastAPI, Svelte-Kit 2, Svelte 5, and SQLite, it provides full session timelines and live tracking.

Pu.sh: 400-Line Shell Script Coding-Agent Harness from HN
Pu.sh is a portable coding-agent harness in 400 lines of shell (sh, curl, awk), supporting Anthropic + OpenAI, 7 tools, REPL, checkpoint/resume, and pipe mode — with 90 no-API tests.

Weejur: A Simple UI Front-End for GitHub Pages Publishing
Weejur is a free tool that provides a simplified UI for publishing websites via GitHub Pages, allowing users to paste HTML or upload files after OAuth login.

Ollama's Technical Issues and Community Controversy
Ollama, a popular local LLM tool, faces criticism for downplaying its reliance on llama.cpp, license compliance issues, and technical problems with its custom backend including performance regressions and reintroduced bugs.