Comparing Multi-Agent AI Systems: Anthropic's Harness vs Agyn's Engineering Org Model

Anthropic has published a harness design for long-running application development, while the Agyn multi-agent system for team-based autonomous software engineering was open-sourced last month on arXiv. Both approaches reject the "monolithic agent" model and instead structure AI agents to work like real engineering teams with role separation, structured handoffs, and review loops.
Core Architecture Differences
Anthropic's system uses a GAN-inspired architecture with three roles: planner → generator → evaluator. The evaluator uses Playwright to interact with the running application like a real user, then provides structured critique back to the generator.
Agyn models the process as an engineering organization with four roles: coordination → research → implementation → review. Agents operate in isolated sandboxes and communicate through defined contracts.
Shared Solutions to Common Problems
- Models losing coherence over long tasks: Anthropic uses context resets with structured handoff artifacts, while Agyn uses compaction with structured handoffs between roles
- Self-evaluation being too lenient: Both systems separate evaluation from generation. Anthropic uses a separate evaluator agent calibrated on few-shot examples, while Agyn has a dedicated review role separated from implementation
- Ambiguous "done" criteria: Anthropic uses sprint contracts negotiated before work starts, while Agyn has a task specification phase with explicit acceptance criteria and required tests
- Complex task decomposition: Anthropic's planner expands one-sentence prompts into full specifications, while Agyn's researcher agent decomposes issues and produces specifications before implementation begins
- Context anxiety: Anthropic uses resets for clean slates, while Agyn uses compaction with a memory layer
Agyn's Distinctive Features
Agyn includes two features not present in Anthropic's harness:
- Isolated sandboxes per agent: Each agent operates in its own isolated file and network namespace, preventing collisions on shared state during parallel or sequential work
- GitHub as shared state: The system uses GitHub primitives (commits, comments, PRs, reviews) that human teams already understand, providing a full audit log without requiring custom communication protocols
Implementation Differences
Anthropic's harness is built tightly around Claude using the Claude Agent SDK and Playwright MCP for the evaluation loop. The evaluator navigates live running applications before scoring.
Agyn is model-agnostic by design, supporting Claude, Codex, and open-weight models. The system allows mixing different models per role, which in practice has been found to outperform using one model for everything.
📖 Read the full source: r/ClaudeAI
👀 See Also

Introducing OneTool MCP: An Open Source Multi-Tool for Developers
OneTool MCP, built using Claude AI, offers developers over 100 tools for tasks like web searches, library updates, and file management without tool tax or context rot.

Claude Toolbox extension adds message-level bookmarks and full-text search
Claude Toolbox is a Chrome extension that lets you bookmark individual messages, full-text search across conversations, and export as TXT or JSON. Free tier covers 2 conversations; paid at $5/month or $49 lifetime.

Building a Geological Clock with Claude Code: Single HTML + Three.js
A product designer built eona.earth, a geological clock that maps Earth's 4.5 billion years onto 12 hours, using Claude Code, Three.js, and custom WebGL shaders — all as a single HTML file with no build step.

Qwen 3.6 27B with MTP on V100 32GB: 54 t/s via llama.cpp Branch
am17an's MTP branch of llama.cc runs Qwen 3.6 27B at 54 t/s on V100 32GB via PCIe adapter, dropping to 29-30 t/s without MTP.