Building a Persistent AI Knowledge Infrastructure with OpenClaw

A developer has built a full knowledge infrastructure system called 'Brain' on top of OpenClaw to address the statelessness problem common in AI setups. The system provides persistent memory across sessions, allowing users to query past decisions and workflow history.
Core Architecture
Brain serves as the central knowledge service where documents are ingested, chunked, and embedded locally using Ollama. Data is stored across multiple databases: Postgres, MongoDB, and Qdrant, with relationships mapped in a Memgraph graph database. This makes every decision, session, and workflow run searchable and connected.
Search and Retrieval
Search in Brain uses hybrid retrieval combining semantic search via Qdrant with BM25 full-text search from Postgres, merged using reciprocal rank fusion. Results are automatically deduplicated and context-budgeted before synthesis.
RAG Agent and Plugin System
On top of Brain sits a RAG Agent that runs a complete pipeline: retrieve → graph expand → fuse → synthesize, all powered by local Ollama models. The agent estimates confidence on every answer and automatically logs 'knowledge gaps' to a pending queue when confidence is low.
The system includes a clean plugin system with 33+ typed tools that agents can call, including: brain_search, brain_ingest, brain_rag_query, brain_graph_slice, and brain_condense_domain. Every operation has a strict, well-typed interface.
Workflows and Observability
Workflows are first-class citizens in this system. Multi-step pipelines—orient, fetch, inspect, synthesize, log—can be run either through agents or via a deterministic runner on a cron schedule with zero LLM involvement. Telemetry and observability remain consistent either way.
Each agent has a strict mandate and communicates through structured handoffs, with all activity tracked back into Brain as searchable history. A Python drift checker compares live agent configs against Brain snapshots, automatically logging structured events when tool allowlists or plugin versions change.
Local Deployment and Future Plans
The entire system runs locally using Ollama for embeddings and synthesis, with Docker for all the stores. There are no OpenAI calls or external APIs for the core intelligence layer.
Next steps include migrating the RAG agent to LlamaIndex Workflows, building out a shared brain-client SDK, and tightening the API surface. RAG endpoints are moving to a /v1/rag/ prefix, realm is becoming a header, and leaky DB facades are getting properly abstracted.
📖 Read the full source: r/openclaw
👀 See Also

Akemon: Publish and Hire AI Coding Agents Directly from Your Laptop
Akemon is a tool that lets developers publish their AI coding agents with one command and hire others' agents with another, working directly from laptops through a relay tunnel without needing servers. It's protocol-agnostic, supporting agents from Claude Code, Codex, Gemini, OpenCode, Cursor, and Windsurf.

nan-forget: Local AI coding memory in a single SQLite file
nan-forget is a memory tool for AI coding agents that stores context in a single SQLite file (~3MB) with no background services. It uses a 3-stage retrieval pipeline and works across Claude Code, Cursor, and terminal via CLI.

Claude's Canva integration: a practical workflow for design generation
Claude's Canva connector exports editable Canva projects with structured layouts, not flat images. The post details a workflow from prompt to finished carousel in 12-15 minutes, including setup, high-fidelity mode, and honest limitations.

Reducing Multi-Modal Agent Latency by Omitting Screenshot History
A developer found that omitting previous screenshots from multi-modal agent requests and replacing base64 image data with "[image omitted]" strings significantly reduces latency while maintaining performance. The experiment was conducted using Claude and documented on GitHub.