Local Semantic Memory Search for OpenClaw Agents Using Harrier Embeddings

A new repo shows how to give an OpenClaw agent local semantic memory search without sending embeddings to an external service. The approach runs a small local embedding server around Microsoft's Harrier model (microsoft/harrier-oss-v1-0.6b), exposes an Ollama-compatible API, and wires it to OpenClaw's memorySearch config.
How it works
The embedding server runs Harrier locally and provides /api/embed and /api/embeddings endpoints that match Ollama's API format. OpenClaw's memorySearch already supports Ollama-style endpoints, so pointing it at http://localhost:8000 gives the agent a local SOTA semantic memory layer.
Why this matters for agent memory
Most agent memory systems have two pain points:
- Shoving too much memory into the prompt burns tokens and makes context messy.
- Keeping memory files small and manual becomes hard to maintain as history grows.
Semantic memory search offers a middle path. Long-term memory stays in normal markdown files (MEMORY.md, daily logs, notes, project files) that are human-readable and editable. At runtime, the agent retrieves only relevant chunks.
Benefits
- Less token waste — not stuffing every durable fact into every prompt.
- Cleaner memory files — no need to compress into one giant context blob.
- Better recall — finds conceptually related notes even when wording doesn't match exactly.
- Easier debugging — source of truth is plain text, not an opaque vector database.
- Better privacy — embeddings computed locally, no data shipped to hosted API.
What the repo includes
- Small Python embedding server implementing Ollama-compatible endpoints
- Example OpenClaw
memorySearchconfig - macOS launchd service template
- Mock markdown memory corpus
- Smoke tests and local query demo
The repo is at github.com/promptclickrun/harrier-openclaw-memory-search.
📖 Read the full source: r/openclaw
👀 See Also

Layered Defense Framework for Claude Code Rule Enforcement
An IT operations professional built an 8-layer defense framework to enforce Claude Code rules after discovering that both CLAUDE.md prompts and blocking hooks could be bypassed. The approach adapts the Swiss cheese model from accident investigation to prevent workarounds.

GLM 5 on Mac M3: Performance Observations for Agentic Coding
A user reports running GLM 5 via MLX 4-bit quantization on a Mac M3 with 512GB RAM, finding it usable for agentic coding with context under 50k tokens but noting significant slowdowns beyond that threshold.

NERF Open Source AI Security Engineering Platform Enters Public Beta
NERF is an open source AI security engineering platform and autonomous coding agent that covers offensive, defensive, and privacy security techniques across 117 domains. It features 9 auto-detected operating modes, 26 LLM provider support, and compliance automation for 39 frameworks.

Query Your Jira Sprint Via Claude MCP: Instant Status, Unassigned Issues, and Blocked Items
A Reddit user connected Jira to Claude via MCP, then asked plain‑language questions about their sprint and got instant clean tables — no clicking through boards.