Engram Memory SDK: Graph-Based Memory for AI Agents with Local Models

✍️ OpenClawRadar📅 Published: April 14, 2026🔗 Source
Engram Memory SDK: Graph-Based Memory for AI Agents with Local Models
Ad

Graph Memory SDK for Local AI Models

Engram Memory SDK is an open-source graph memory system designed for AI agents that works with local models through LiteLLM integration. The core architecture separates ingestion from recall: you only need the LLM once during ingestion to extract entities and relationships, while recall operates through pure vector search, graph traversal, and scoring without requiring additional LLM calls.

Technical Details

The SDK is built with async Python and uses Neo4j as its backend database. According to the source, it averages ~735 tokens per ingestion operation and achieves 95ms recall latency. The system includes self-restructuring memory features with decay and clustering running in the background.

Ad

Setup and Installation

Installation is straightforward:

pip install engram-memory-sdk

Configuration requires a .env file with these variables:

LLM_MODEL=ollama/llama3 # or any LiteLLM-supported local model
NEO4J_URI=bolt://localhost:7687

The system supports any model via LiteLLM, including local deployments through Ollama, vLLM, and text-generation-webui. The key advantage is cost efficiency: with a small local model handling extraction, ongoing recall operations have literally $0 cost since they don't consume LLM tokens.

📖 Read the full source: r/LocalLLaMA

Ad

👀 See Also