LightMem: Lightweight Memory System for LLM Agents with 10×+ Gains and 100× Lower Cost

LightMem: A Practical Memory Layer for LLM Agents
LightMem is a lightweight, modular memory system for LLM agents that addresses the challenges of long, multi-turn interactions where context grows noisy and expensive, models get "lost in the middle," and existing memory systems add latency and token cost.
How LightMem Works
The system maintains compact, topical, and consistent memories through three key mechanisms:
- Pre-compress sensory memory: Filters redundant and low-value tokens before storage
- Topic-aware short-term memory: Clusters turns by topic and summarizes into precise memory units
- Sleep-time long-term consolidation: Uses incremental inserts at runtime plus offline high-fidelity updates without latency impact
Performance Results
On the LongMemEval benchmark, LightMem shows:
- Accuracy improvement: up to ~10.9%
- Token reduction: up to 117×
- API call reduction: up to 159×
- Runtime reduction: >12×
Recent Updates and Features
- Baseline evaluation framework across memory systems (Mem0, A-MEM, LangMem) on LoCoMo & LongMemEval
- Demo video and tutorial notebooks for multiple scenarios
- MCP Server integration for multi-tool memory invocation
- Full LoCoMo dataset support
- GLM-4.6 integration with reproducible scripts
- Local deployment via Ollama, vLLM, Transformers with auto-load capability
Positioning and Use Cases
LightMem is designed as a modular memory layer that can integrate with various agent stacks including:
- Long-context agents
- Tool-using agents
- Autonomous workflows
- Conversational systems
The system provides structured memory that scales without exploding token counts, making it particularly useful for developers working with agent frameworks, memory/RAG systems, long-context models, and applied LLM teams.
Availability
Paper: https://arxiv.org/abs/2510.18866
Code: https://github.com/zjunlp/LightMem
📖 Read the full source: r/LocalLLaMA
👀 See Also

Temporal-MCP: Wall-Clock Awareness for LLMs with OAuth Support
Temporal-MCP is a minimal MCP server that provides wall-clock awareness to LLMs, addressing time-related failure modes like incorrect greetings and stale context. It offers two tools (temporal_tick and temporal_peek) returning elapsed time, day-rollover detection, and fresh-thread flags.

Qwen2-0.5B Fine-Tuned for Local Task Automation with llama.cpp
A developer fine-tuned Qwen2-0.5B for task automation using LoRA on ~1000 custom examples, creating a 300MB GGUF model that runs locally on CPU via llama.cpp. The model takes natural language tasks, detects task types, and generates execution plans with CLI commands and hotkeys.

Scrapling integrated as OpenClaw's scraping backbone
Scrapling, an open-source library that learns page structure and adapts to changes, has been integrated into OpenClaw as its core scraping engine. It's 774x faster than BeautifulSoup with Lxml and supports multiple selector types with async sessions.

devopsiphai: Open-source Claude Code skill audits operational health across 6 phases
devopsiphai is an open-source Claude Code skill that audits production project operability using a 6-phase process and ARC framework, outputting letter grades and a structured TODO.md with effort-estimated tasks.