Dirac: Open-Source Agent Tops TerminalBench with 65.2%, Cheaper and Open

Dirac is an open-source coding agent that just topped the TerminalBench 2.0 leaderboard for gemini-3-flash-preview with a 65.2% score — beating Google's official baseline of 47.6% and the previous top closed-source agent Junie CLI at 64.3%. The run was done fully open-source, with no benchmark-specific AGENTS.md files or other cheating mechanisms. The maintainer submitted a PR to the leaderboard 8 days ago but has not received a response due to backlog.
Key Features
- Hash-anchored parallel edits for efficient and accurate code changes.
- AST manipulation to understand and transform code structurally.
- Context curation to keep context tightly focused, improving accuracy and reducing costs — claims 64.8% average cost reduction vs other agents.
- No MCP (Model Context Protocol) — straightforward tooling.
TerminalBench 2.0 Results
Scored on gemini-3-flash-preview: 65.2% vs Google's 47.6% and Junie CLI's 64.3%. The run was done in a leaderboard-compliant way (no resource or timeout modifications). All code is on GitHub — no difference between what was run and what is public.
Cost Comparison
Dirac's average cost per task across 8 benchmarks (against Cline, Kilo, Ohmypi, Opencode, Pimono, Roo) was $0.18, vs the next best at $0.38. That's a 64.8% reduction, or 2.8x cheaper. For example, Task1 (transformers, 8 files) cost $0.13 vs Cline's $0.37. Task6 (transformers, 25 files) cost $0.34 vs Ohmypi's $0.94.
Installation & Usage
Clone the repo and follow setup instructions in the README.md. The agent runs as a CLI tool. No special setup beyond Node.js and API keys for the chosen model.
📖 Read the full source: HN AI Agents
👀 See Also

SwarmClaw Dashboard Adds Orchestration Layer to OpenClaw
SwarmClaw is a self-hosted dashboard that wraps OpenClaw, providing deployment and management of multiple instances with gateway controls, config repair, remote history sync, and live execution approval. It supports OpenClaw plugins and SKILL.md files, plus connects to 14 other AI providers.

Conduid.com indexes 23,000+ MCP servers into searchable directory
Conduid.com aggregates MCP servers from 11 sources, deduplicates them, and provides search, categories, and trust scores based on GitHub activity, documentation quality, and maintenance signals.

Rukuzu: Porting a 200,000 Line C++ Graph Database to Rust with Systematic Testing
The Rukuzu project describes a workflow for porting the 200,000-line C++ kuzu embedded graph database to Rust, using a Claude Code custom command to maintain both versions simultaneously and verify correctness through 2,700+ tests.

Handoffs Pattern in Claude Workflows: Two-File Split vs One-Doc Summary
Long Claude sessions break on context decay. Handoffs compress what matters and start fresh. Two approaches: Matt Pocock's single-doc handoff skill vs a two-file split with persistent narrative and ephemeral prompt.