Hybrid search with RRF improves AI memory system over pure vector search

✍️ OpenClawRadar📅 Published: April 15, 2026🔗 Source
Hybrid search with RRF improves AI memory system over pure vector search
Ad

An open-source memory system for AI assistants has been developed, using PostgreSQL with pgvector in a local-first, self-hosted setup. The system stores information for AI assistants to remember across sessions and makes it searchable.

Why pure vector search wasn't enough

The developer started with pure vector search: embedding queries, using cosine similarity, and returning top-k results. While this worked for vague questions, it consistently failed on exact matches. For example, searching for "RRF merging" would return chunks about "combining ranked lists" from months ago instead of the document that literally says "RRF merging."

Hybrid search solution

The solution involved adding a second search arm: full-text search using PostgreSQL's tsvector with a GIN index. This keyword matching catches what vector search misses. However, this created two ranked lists that needed merging.

Reciprocal Rank Fusion (RRF)

Reciprocal Rank Fusion proved to be the answer for merging the two ranked lists. The formula is simple: score = 1 / (k + rank), where k=60 (the standard value). Results that appear in both lists get both scores added. This approach requires no weight tuning and no score normalization between cosine similarity and ts_rank—it only uses rank positions.

Ad

Query enrichment technique

Before searching, the system runs queries through the embedding model's WordPiece tokenizer to extract key terms (multi-subword tokens that are likely technical or domain terms). This generates up to 3 query variations, embeds all of them, and searches in parallel. This catches results that one phrasing might miss.

Technical stack

  • PostgreSQL 16 + pgvector (HNSW index for vectors, GIN index for full-text)
  • all-MiniLM-L6-v2 for embeddings (384 dimensions, runs on CPU)
  • Python with async psycopg 3
  • 3 ingestion adapters: markdown, plaintext, and Claude conversation JSON

The entire system runs locally with no API calls for embeddings and no cloud dependencies. The code was recently shipped, and the developer has written a detailed blog post about the full approach.

📖 Read the full source: r/LocalLLaMA

Ad

👀 See Also