agentmemory V4 achieves 96.2% on LongMemEval benchmark, outperforms commercial AI memory systems

agentmemory V4 is an open-source memory system for AI agents that just achieved a world record score of 96.2% on LongMemEval, the standard benchmark for long-term AI agent memory.
Benchmark Performance
The system outperformed several funded AI memory companies:
- PwC Chronos: 95.6%
- Mastra: 94.87%
- OMEGA: 93.2% (raw)
- Supermemory: 85.86%
- Emergence AI: 86%
- Zep: 71.2%
Development Details
Built solo in 16 days on a mid-range gaming PC (i3-12100F) with a total cost of $1,000. The system uses Claude Opus as a generator and GPT-4o as a judge, but the retrieval architecture is the core innovation.
Technical Architecture
The system combines multiple retrieval techniques in a single SQLite-backed system:
- HNSW (Hierarchical Navigable Small World) for approximate nearest neighbor search
- BM25 for traditional text retrieval
- Cross-encoder for relevance scoring
- Knowledge graph integration
- Temporal grounding for time-aware memory retrieval
Availability
The system is open source under the MIT license and available at: github.com/JordanMcCann/agentmemory
📖 Read the full source: r/LocalLLaMA
👀 See Also

Clavis MCP Server: Secure Credential Management for Claude Desktop
Clavis is an MCP server that manages API keys and OAuth tokens for Claude Desktop, storing credentials with AES-256 encryption and providing automatic token refresh to prevent mid-conversation expiration errors.

Efficient Workflow Using Claude Code: Planning Before Execution
Boris Tane leverages Claude Code with a structured planning-first approach, focusing on detailed research and planning to maintain control over architecture decisions.

Bit-Chat: AI Agents Can Send Bitcoin via Lightning Through Messaging Platforms
A setup called Bit-Chat enables AI agents to send Bitcoin payments over the Lightning network through email, WhatsApp, Telegram, or Signal. Agents can generate dedicated addresses like [email protected] and payments work even if the receiver isn't registered.

Claude's 171 Internal Emotion Vectors Influence Output: Toolkit Based on Anthropic Research
Anthropic's research paper reveals Claude has 171 internal activation patterns that function like emotion vectors, causally driving its behavior before it writes. A developer created a toolkit with 7 practical prompting principles and system prompts based on these findings.