Local semantic search for AI conversations with fastembed and LanceDB

✍️ OpenClawRadar📅 Published: March 20, 2026🔗 Source
Local semantic search for AI conversations with fastembed and LanceDB
Ad

A developer has implemented a local semantic search system for AI conversation history, processing 368K messages without cloud dependencies or API keys. The project uses fastembed with the BAAI/bge-small-en-v1.5 model for CPU-based embeddings and LanceDB as a vector store that operates as a single directory without a server process.

Technical Stack

  • Embeddings: fastembed with BAAI/bge-small-en-v1.5 model (384 dimensions)
  • Vector store: LanceDB - single directory, no server process, append-friendly
  • Ingest: Pulls from JSONL session transcripts (Claude Code, any chat export)
  • Embedding performance: ~500 docs/sec on M4 CPU

Key Implementation Details

The developer learned several practical lessons during the 4-month iteration:

  • Selective embedding: Early versions embedded every message, which reduced signal-to-noise. The current implementation only embeds user messages and assistant messages with substance (skipping responses like "sure, here's that code"), cutting vector count by 60% while improving search quality.
  • Chunking strategy: Switching from fixed-size chunks to conversation-turn chunks made a massive difference in retrieval relevance. Model choice (tried nomic-embed-text, bge-large, all-MiniLM) showed marginal differences compared to chunking approach.
  • LanceDB advantages: The developer found LanceDB "stupidly underrated for personal-scale" - no server, no Docker, just a directory with instant appending of new vectors, replacing an overengineered pgvector setup.
  • Re-embedding workflow: The bge-small-en-v1.5 model at 384 dimensions is fast enough to re-embed hourly as a cron job. A full re-index of 117K vectors takes approximately 4 minutes on M2 hardware.
Ad

Performance Metrics

  • Total messages ingested: 407K
  • Vectors indexed: 87K
  • Search latency (p50): 12ms across 117K vectors
  • Full re-index time: ~4 minutes (M2)
  • Storage: ~180MB on disk
  • API keys needed: 0

The project is open source under MIT license and available at github.com/mordechaipotash/brain-mcp. Installation is via pipx install brain-mcp && brain-mcp setup.

📖 Read the full source: r/LocalLLaMA

Ad

👀 See Also