AgentMemory V4 Scores 96.2% on LongMemEval, Beats AI Rivals

agentmemory V4 is an open-source memory system for AI agents that just achieved a world record score of 96.2% on LongMemEval, the standard benchmark for long-term AI agent memory.

Benchmark Performance

The system outperformed several funded AI memory companies:

PwC Chronos: 95.6%
Mastra: 94.87%
OMEGA: 93.2% (raw)
Supermemory: 85.86%
Emergence AI: 86%
Zep: 71.2%

Development Details

Built solo in 16 days on a mid-range gaming PC (i3-12100F) with a total cost of $1,000. The system uses Claude Opus as a generator and GPT-4o as a judge, but the retrieval architecture is the core innovation.

Technical Architecture

The system combines multiple retrieval techniques in a single SQLite-backed system:

HNSW (Hierarchical Navigable Small World) for approximate nearest neighbor search
BM25 for traditional text retrieval
Cross-encoder for relevance scoring
Knowledge graph integration
Temporal grounding for time-aware memory retrieval

Availability

The system is open source under the MIT license and available at: github.com/JordanMcCann/agentmemory

📖 Read the full source: r/LocalLLaMA

agentmemory V4 achieves 96.2% on LongMemEval benchmark, outperforms commercial AI memory systems

Benchmark Performance

Development Details

Technical Architecture

Availability

👀 See Also

Token Enhancer reduces webpage token usage for AI agents

Super Claude browser extension tracks Claude AI usage velocity and limit predictions

Agentlint: GitHub App that catches CLAUDE.md contradictions and broken pointers on every PR

Microsoft DebugMCP VS Code Extension Gives AI Agents Debugging Capabilities