LightMem: Lightweight Memory System for LLM Agents with 10×+ Gains and 100× Lower Cost

✍️ OpenClawRadar📅 Published: February 26, 2026🔗 Source

LightMem: A Practical Memory Layer for LLM Agents

LightMem is a lightweight, modular memory system for LLM agents that addresses the challenges of long, multi-turn interactions where context grows noisy and expensive, models get "lost in the middle," and existing memory systems add latency and token cost.

How LightMem Works

The system maintains compact, topical, and consistent memories through three key mechanisms:

Pre-compress sensory memory: Filters redundant and low-value tokens before storage
Topic-aware short-term memory: Clusters turns by topic and summarizes into precise memory units
Sleep-time long-term consolidation: Uses incremental inserts at runtime plus offline high-fidelity updates without latency impact

Performance Results

On the LongMemEval benchmark, LightMem shows:

Accuracy improvement: up to ~10.9%
Token reduction: up to 117×
API call reduction: up to 159×
Runtime reduction: >12×

Recent Updates and Features

Baseline evaluation framework across memory systems (Mem0, A-MEM, LangMem) on LoCoMo & LongMemEval
Demo video and tutorial notebooks for multiple scenarios
MCP Server integration for multi-tool memory invocation
Full LoCoMo dataset support
GLM-4.6 integration with reproducible scripts
Local deployment via Ollama, vLLM, Transformers with auto-load capability

Positioning and Use Cases

LightMem is designed as a modular memory layer that can integrate with various agent stacks including:

Long-context agents
Tool-using agents
Autonomous workflows
Conversational systems

The system provides structured memory that scales without exploding token counts, making it particularly useful for developers working with agent frameworks, memory/RAG systems, long-context models, and applied LLM teams.

Availability

Paper: https://arxiv.org/abs/2510.18866

Code: https://github.com/zjunlp/LightMem

📖 Read the full source: r/LocalLLaMA

👀 See Also

Tools

OMAR: Open-Source TUI for Managing Hundreds of AI Coding Agents Hierarchically

OMAR is a terminal-based dashboard that lets you manage swarms of coding agents (Claude Code, Codex, Cursor, Opencode) in hierarchical orgs. Built on tmux. Features agent-managing-agent hierarchies, heterogeneous backends, and Slack integration.

May 2, 2026, 12:19 AM UTC

OpenClawRadar

Tools

Claude AI Session Compaction Issues and Workarounds

Default compaction in Claude AI sessions can degrade retrieval accuracy from ~9.75/10 to ~5/10, causing hallucinations. The user tested with 418K tokens and found manual compaction using Opus maintains accuracy while default compaction fails.

Mar 17, 2026, 07:45 PM UTC

OpenClawRadar

Tools

Open-source markdown vault gives Claude persistent memory across sessions

My Portable Brain is a markdown vault structure with an agent runtime layer that provides Claude with persistent context about identity, projects, goals, CRM, and weekly plans. It works natively with Claude Code and Claude Cowork, uses plain markdown files, and runs background scripts nightly to keep context fresh.

Apr 20, 2026, 05:38 PM UTC

OpenClawRadar

Tools

Multi-Model Council Workflow for AI Coding Agents

A developer built a web tool that runs coding tasks through three AI models—GPT-4o as architect, Claude as skeptic, and Gemini as synthesizer—before passing them to coding agents. The tool generates a PLAN.md with explicit constraints and requires users to bring their own API keys.

Mar 15, 2026, 12:45 AM UTC

OpenClawRadar