Hybrid RAG for Local Agent Memory with OpenClaw, Ollama, and nomic-embed-text

Problem: Retrieval, Not Storage
The developer had months of daily memory logs stored in markdown files, which worked for saving information but not for finding it again. When the agent needed past context, it would fall back to running ls, opening files one by one, spending tokens, and sometimes missing relevant information. The issue was retrieval by meaning, not storage.
Solution: Hybrid RAG with Local Embeddings
The developer enabled memorySearch in OpenClaw using Ollama as the provider and nomic-embed-text for local embeddings, running in hybrid mode. Hybrid means 70% vector similarity (cosine via nomic-embed-text) combined with 30% BM25 keyword matching. Vector handles semantic proximity while BM25 handles exact names, versions, and IDs. MMR reduces redundant results, and temporal decay gives more weight to recent logs. Everything runs locally without external APIs.
Configuration
"memorySearch": {
"provider": "ollama",
"query": {
"hybrid": {
"enabled": true,
"vectorWeight": 0.7,
"textWeight": 0.3,
"mmr": {
"enabled": true,
"lambda": 0.7
},
"temporalDecay": {
"enabled": true,
"halfLifeDays": 30
}
}
}
}Setup Instructions
- OpenClaw detects Ollama automatically at localhost:11434
- No need to specify baseUrl or model - it picks up nomic-embed-text if pulled
- Run
ollama pull nomic-embed-textfirst, then restart the gateway - Avoid setting
provider: "openai"and pointing baseUrl to Ollama - useprovider: "ollama"directly
Behavioral Change Required
Enabling the tool wasn't enough. Without explicit instructions to use memorySearch before reading files directly, the agent would skip it and take the slower, token-heavy route. The developer wrote a rule into both AGENTS.md and MEMORY.md in the workspace to make memory search part of the agent's normal workflow.
Before vs After Results
- Before: Browse folders, open files blindly, hope wording matches, waste tokens, miss context
- After: Run
memory_searchwith semantic query, retrieve ranked results with similarity scores, open best match, answer from actual past notes - Similarity scores for relevant results typically range 0.45 to 0.48 for nomic-embed-text on prose logs
Practical Notes
- nomic-embed-text has a 2048 token context limit by default, not 8192 - large files may get truncated at indexing
- Memory files in Spanish work well - nomic-embed-text handles Spanish without issues
- Retrieval quality depends on note quality - vague logs still cause semantic search struggles
Tech Stack
- OpenClaw (local, self-hosted)
- Ollama + nomic-embed-text:latest
- SQLite with sqlite-vec and FTS5 (created automatically by OpenClaw on first use)
- Mac mini M4, 16GB unified memory
📖 Read the full source: r/openclaw
👀 See Also

Designer builds full-stack platform with Claude CLI: lessons from zero formal coding background
A designer with WordPress experience used Claude CLI to build a medical journal management platform handling 500+ event registrations, 3,500+ restricted area users, and 100+ e-learning courses. Key lessons include using separate AI instances for debugging and version controlling everything on GitHub.

SkiTomorrow.ai: A Ski Trip Decision Engine Built with Claude Code
SkiTomorrow.ai is a free web tool that scores 234 ski resorts worldwide based on live snow forecasts, travel distance, and cost, then provides personalized rankings. The developer built it entirely using Claude Code and shared specific workflow insights.

Operational Memory Over Automation: Why Small Business Agents Need to Remember
The real value for small business AI agents isn't automation — it's operational memory. A white paper from McPhersonAI argues agents should behave like disciplined operators: remember standards, notice drift, preserve context, and surface what matters.

Using Claude Code with ha-mcp for Home Assistant automation
A developer reports using Claude Code with the ha-mcp tool to connect to Home Assistant, enabling rapid dashboard creation and solar charging system setup through detailed prompts.