Benchmark Results: Claude Agent Swarm with Memory System Shows 30-43% Token Cost Savings

Memory System Benchmark for Claude Agent Swarms
A developer has been building a memory system called Stompy for nine months, evolving from file-based to SQLite to PostgreSQL. The goal was to minimize token usage when running Claude agent swarms. They conducted a benchmark comparing performance with and without the memory system.
Test Setup
The benchmark used a 40-point coding task requiring a full booking feature with backend, frontend, and tests. A 6-agent swarm was tested with three different Claude models as lead: Sonnet 4.6, Opus 4.6, and Haiku 4.5. All tests used the same codebase, same teammates, and same scoring system. Teammate agents always ran Opus regardless of the lead model.
Benchmark Results
- Sonnet 4.6 + memory: 40/40, $3.98, 6.5min, 2 turns
- Sonnet 4.6 no memory: 40/40, $7.04, 9.6min, 4 turns
- Opus 4.6 + memory: 40/40, $4.34, 9.6min, 29 turns
- Opus 4.6 no memory: 40/40, $7.65, 10.0min, 70 turns
- Haiku 4.5 + memory: 39/40, $4.95, 7.5min, 2 turns
- Haiku 4.5 no memory: 0/40, $3.97, 5.8min, 3 turns
Key Findings
Opus and Sonnet with memory saved about 43% on cost compared to running without memory. The developer notes that these models are smart enough to complete the task without memory, but they burn tokens on codebase exploration that the memory system eliminates.
The Haiku result was unexpected: it scored 0/40 without memory but 39/40 with memory. The developer observed that Haiku couldn't coordinate the Opus teammate agents without understanding the project structure, but became a competent lead with memory access.
Sonnet with memory was the best overall configuration, beating memoryless Opus on every metric at roughly half the cost. The takeaway is that making project knowledge available to the model matters more than using expensive models.
Technical Details
The memory system is called Stompy and is MCP/API/CLI-based, working with Claude Code. The benchmark setup is available on GitHub for others to use or improve. The developer notes this is n=1 per condition so far, with more runs planned.
📖 Read the full source: r/ClaudeAI
👀 See Also

ClawProxy: Self-Hosted AI Routing Proxy with Dashboard
ClawProxy is an open-source, self-hosted proxy that centralizes management of multiple AI API keys and models. It provides a unified endpoint, smart key rotation, provider fallback, and real-time logging via a React dashboard.

A/B Test Results: oh-my-claudecode Hooks Show Minimal Impact on Claude Code Performance
A developer spent 7% of their weekly Max20 tokens testing oh-my-claudecode hooks with Claude Sonnet 4.6, finding no meaningful improvement in code quality or cost for a single-session coding task.

Reddit User Shares AI Tool for Gathering Financial Account Balances
A Reddit post on r/openclaw presents an AI agent designed to streamline the collection of financial account balances using Python. Users discuss automation potential via custom scripts leveraging APIs like Plaid.

LLM Agent Builds Complete Godot 4 Dungeon Crawler Using Visual Feedback
A developer connected an LLM agent to Godot 4 using an MCP tool and gave it a single prompt to build a dungeon crawler FPS. The agent created a complete prototype with 3 rooms, lighting, combat, enemies, and progression by running the game, taking screenshots, and fixing visual issues.