Self-Hosted Memory Layer for Claude Runs Free on Cloudflare

A Reddit user built second-brain-cloudflare, an open-source MCP server that adds persistent memory to Claude. It runs entirely on Cloudflare's free tier using Workers, D1 (SQLite), Vectorize, and Workers AI.
Key Features
- Four MCP tools:
remember,recall,list_recent,forget. - Semantic search via
recall: notes are vector-embedded usingbge-small-en-v1.5model via Workers AI and stored in Cloudflare Vectorize. Searches match by meaning, not keywords. - Works with Claude Desktop, Claude Code, and
claude.ai(via custom connectors).
How It Works
You add instructions to Claude's system prompt. The server is deployed via one-click deploy button on the repo. Context: remember stores a note, recall searches semantic embeddings, list_recent shows recent notes, forget deletes a note. The stack: TypeScript, Cloudflare Workers + D1 + Vectorize + Workers AI.
Tradeoffs and Implementation Details
The author notes that semantic search has tradeoffs — embedding quality, latency, and cost are discussed in the Reddit thread. The free tier handles personal use without issue.
📖 Read the full source: r/ClaudeAI
👀 See Also

OpenClaw's AWS Deployment: A Focus on Automation
OpenClaw's tool allows for one-click deployment to AWS, simplifying cloud operations for developers using AI coding agents.

Bypassing NemoClaw Sandbox Isolation for Local Nemotron 9B Agent
A developer bypassed NemoClaw's sandbox isolation to run a fully local agent using Nemotron 9B with tool calling on a single RTX 5090. The approach involved iptables configuration, a custom TCP relay, and real-time tool call translation.

Mneme: A Free, Local-First Claude Chat Client with Persistent Memory
Mneme is a free, open-source, local-first Claude chat client with tiered memory, entity tracking, daily summaries, and support for Sonnet 4.5 via the Anthropic API.

2-Prompt System to Carry Context Between Claude Chats Without Token Waste
A developer shares two prompts for compressing an entire Claude conversation into a structured context block and loading it into a fresh chat, preserving decisions, work, and next steps.