Benchmark shows context engine reduces AI coding agent costs by 3x on SWE-bench

✍️ OpenClawRadar📅 Published: March 23, 2026🔗 Source
Benchmark shows context engine reduces AI coding agent costs by 3x on SWE-bench
Ad

A developer benchmarked four AI coding agents on SWE-bench Verified using the same Claude Opus 4.5 model, with context management as the only variable. The results show significant cost differences for similar performance levels.

Benchmark setup

The test used a 100-task stratified subset of SWE-bench Verified with all 12 repositories represented proportionally. All agents ran Claude Opus 4.5 with the same $3/task budget and 250-turn limit. The only difference was the context layer in front of the model.

Results

  • Context engine + Claude Code: 73.0% Pass@1, $0.67/task
  • Live-SWE-Agent: 72.0% Pass@1, $0.86/task
  • OpenHands: 70.0% Pass@1, $1.77/task
  • Sonar Foundation: 70.0% Pass@1, $1.98/task

The most expensive setup costs 3x more per task for a lower resolution rate. Eight tasks were solved only by the setup with the context layer - bugs that the model couldn't fix without seeing the right code.

Limitations

On matplotlib (rendering-heavy, visual output code), the context engine scored 43% while Sonar Foundation hit 86%. Graph-based context is less effective when relevant code doesn't follow dependency chains.

Ad

How the context layer works

Instead of letting Claude read entire files, it pre-indexes the codebase into a dependency graph using tree-sitter + SQLite (30 languages supported) and returns a ranked context capsule: full source for functions that matter, skeletonized signatures for everything connected to them. The agent starts every task already knowing what's relevant.

It includes session memory that persists across sessions via MCP. When code changes, previous observations get flagged as stale automatically, so the agent doesn't re-explore the same things.

The system is 100% local with no cloud, no account, and no code leaving your machine. It works with Claude Code and 11 other agents via MCP.

Open source availability

The benchmark harness, all evaluation logs, per-instance results, and comparison scripts are available on GitHub at github.com/Vexp-ai/vexp-swe-bench. The tool itself is available at vexp.dev with a free tier, VS Code extension, or CLI. Full benchmark results with charts are at vexp.dev/benchmark.

📖 Read the full source: r/ClaudeAI

Ad

👀 See Also