Local semantic search for AI conversations with fastembed and LanceDB

A developer has implemented a local semantic search system for AI conversation history, processing 368K messages without cloud dependencies or API keys. The project uses fastembed with the BAAI/bge-small-en-v1.5 model for CPU-based embeddings and LanceDB as a vector store that operates as a single directory without a server process.
Technical Stack
- Embeddings: fastembed with BAAI/bge-small-en-v1.5 model (384 dimensions)
- Vector store: LanceDB - single directory, no server process, append-friendly
- Ingest: Pulls from JSONL session transcripts (Claude Code, any chat export)
- Embedding performance: ~500 docs/sec on M4 CPU
Key Implementation Details
The developer learned several practical lessons during the 4-month iteration:
- Selective embedding: Early versions embedded every message, which reduced signal-to-noise. The current implementation only embeds user messages and assistant messages with substance (skipping responses like "sure, here's that code"), cutting vector count by 60% while improving search quality.
- Chunking strategy: Switching from fixed-size chunks to conversation-turn chunks made a massive difference in retrieval relevance. Model choice (tried nomic-embed-text, bge-large, all-MiniLM) showed marginal differences compared to chunking approach.
- LanceDB advantages: The developer found LanceDB "stupidly underrated for personal-scale" - no server, no Docker, just a directory with instant appending of new vectors, replacing an overengineered pgvector setup.
- Re-embedding workflow: The bge-small-en-v1.5 model at 384 dimensions is fast enough to re-embed hourly as a cron job. A full re-index of 117K vectors takes approximately 4 minutes on M2 hardware.
Performance Metrics
- Total messages ingested: 407K
- Vectors indexed: 87K
- Search latency (p50): 12ms across 117K vectors
- Full re-index time: ~4 minutes (M2)
- Storage: ~180MB on disk
- API keys needed: 0
The project is open source under MIT license and available at github.com/mordechaipotash/brain-mcp. Installation is via pipx install brain-mcp && brain-mcp setup.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Hyper iOS App: Voice Recorder with Real-Time Transcription and Action Extraction
Hyper is an iOS voice recorder app that transcribes conversations in real-time, provides summaries and action items, and allows mid-conversation queries via wakeword detection. It's designed for unstructured meetings like 1:1s, coffee chats, and standups.

Kreuzberg v4.7.0 adds code intelligence for 248 languages and improved markdown extraction
Kreuzberg v4.7.0, a Rust-core document intelligence library, now supports code extraction for 248 formats via tree-sitter and has significantly improved markdown quality with Structural F1 scores over 80% across 23 formats.

Sociality.io Releases MCP Server for Claude: Live Social Media Intelligence via OAuth
Sociality.io launched a remote HTTP MCP server that lets Claude access live reporting and competitor data across Instagram, TikTok, Facebook, YouTube, X, and LinkedIn. Free to try.

Relational Memory for LLMs: Three-Layer System Models User Relationships
An open-source Python tool that adds relational memory to LLMs by modeling user-AI relationships across seven psychological dimensions, using a three-layer narrative structure instead of flat fact storage.