Engram Memory SDK: Graph-Based Memory for AI Agents with Local Models

✍️ OpenClawRadar📅 Published: April 14, 2026🔗 Source

Graph Memory SDK for Local AI Models

Engram Memory SDK is an open-source graph memory system designed for AI agents that works with local models through LiteLLM integration. The core architecture separates ingestion from recall: you only need the LLM once during ingestion to extract entities and relationships, while recall operates through pure vector search, graph traversal, and scoring without requiring additional LLM calls.

Technical Details

The SDK is built with async Python and uses Neo4j as its backend database. According to the source, it averages ~735 tokens per ingestion operation and achieves 95ms recall latency. The system includes self-restructuring memory features with decay and clustering running in the background.

Setup and Installation

Installation is straightforward:

pip install engram-memory-sdk

Configuration requires a .env file with these variables:

LLM_MODEL=ollama/llama3 # or any LiteLLM-supported local model
NEO4J_URI=bolt://localhost:7687

The system supports any model via LiteLLM, including local deployments through Ollama, vLLM, and text-generation-webui. The key advantage is cost efficiency: with a small local model handling extraction, ongoing recall operations have literally $0 cost since they don't consume LLM tokens.

📖 Read the full source: r/LocalLLaMA

👀 See Also

Tools

Claude AI Session Compaction Issues and Workarounds

Default compaction in Claude AI sessions can degrade retrieval accuracy from ~9.75/10 to ~5/10, causing hallucinations. The user tested with 418K tokens and found manual compaction using Opus maintains accuracy while default compaction fails.

Mar 17, 2026, 07:45 PM UTC

OpenClawRadar

Tools

Culpa: Open Source Deterministic Replay Engine for AI Agent Debugging

Culpa is an open source tool that records LLM agent sessions with full execution context, enabling deterministic replay using recorded responses as stubs instead of hitting real APIs. It works with Anthropic and OpenAI APIs via proxy mode or Python SDK.

Apr 20, 2026, 05:38 PM UTC

OpenClawRadar

Tools

Developer Tests Qwen3.5 27B vs Larger Models for Local Coding Tasks

A developer tested multiple Qwen3.5 and Nemotron models, finding Qwen3.5-27B-GGUF:UD-Q6_K_XL performs well for development tasks on existing 2x RTX 3090 hardware, with 803 pp and 25 tg/s at 256k context on vast.ai.

Mar 28, 2026, 06:45 PM UTC

OpenClawRadar

Tools

Memtrace: Persistent, Time-Aware Codebase Memory for Claude Code Agents

Memtrace provides always-fresh snapshots and bi-temporal replay for Claude Code agents, using Tree-sitter AST parsing and hybrid retrieval (BM25 + Jina-code embeddings) with zero LLM inference cost during indexing.

May 4, 2026, 02:20 PM UTC

OpenClawRadar