Practical Findings from 11 Multi-Agent Software Builds Without Programmatic Scaffolding

Key Technical Findings from Multi-Agent System Experiments
Analysis of 11 autonomous multi-agent software builds without programmatic scaffolding, based on 295M tokens, 98 agent sessions, and 6.1M lines of worker output, reveals practical insights for developers working with AI coding agents.
Scope Enforcement and Orchestration
Scope enforcement is solved mechanically, not through prompts: Prompt-based approaches failed 0/20 times under compiler pressure, while mechanical approaches (letting agents edit everything and using git revert for out-of-scope files) succeeded 20/20 times. The key insight: don't ask models to respect boundaries—enforce them after the fact.
Orchestrator costs are memory-bound: Approximately 95% of input spend is re-reading conversation history. The "statefulness premium" means a frontier orchestrator that writes zero shipped code can cost as much as the entire worker fleet. Optimization should target fewer turns and less re-ingestion, not cheaper reasoning.
Coordination and Scaling Dynamics
Models don't independently discover coordination: Bare-prompt Opus with full tool access never delegated, never wrote specs, and never discovered parallel dispatch—it just built everything solo. The coordination template does real work.
Depth scales differently than quality: Flat dispatch beats hierarchy at ≤10 domains on throughput, token efficiency, and wall time. Above 10 domains, hierarchy enables parallelism that flat dispatch can't achieve.
Solo outperforms coordination until context limits bind: Solo throughput is approximately 325 LOC/min and invariant to project size. Pyramid throughput scales with workers. Below ~30K LOC, delegation is pure overhead.
Worker Performance and Type Systems
Worker model capability drives throughput: Same architecture, same spec, three worker models produced: 17,761 LOC vs 6,001 vs 1,818—a 9.8x gap. Architecture enables parallel throughput; the worker model determines it.
Type contracts provide shared vocabulary: Integration succeeds without contracts at every scale tested (6–36 modules), even under read-only constraints. But without contracts, parallel workers silently produce structurally incompatible types that compile only because nothing cross-references. A single 984-line contract written blind held across 10 independent domains.
Type contracts eliminate coordination overhead at scale: Controlled scaling test (1–20 workers, fixed spec) showed zero integration errors across 50 domain builds. Sweet spot at 10 workers: 2.05x wall-time speedup. At 20 workers, serial phase dependencies negate parallelism gains (Amdahl's serial fraction ~44%).
Context and Delegation Patterns
Context priming works; format doesn't matter: 0% formula transfer cold, 100% with design context present (N=10 per condition). A static reference document produces identical transfer rates to a synthetic boot conversation.
Delegation compression is inherent: Each delegation layer acts as a lossy summarizer. Quantitative requirements ("80 weapons") vanish; structural requirements (type interfaces) survive. Fix: workers should read full specs from the filesystem rather than relying on compressed prompt chains.
Compaction recovery is robust with good summaries: Zero task relapse across 11 compaction events. The model states expected state, then reads disk to verify.
Failure Modes and Fixes
- Abstraction reflex: Builds an orchestrator instead of orchestrating—name it in the prompt
- Self-model error: Claims false capabilities—document available tools explicitly
- Identity paradox: Can't hold dual roles—use separate model instances
- Delegation compression: Use enumerative specs plus filesystem access
📖 Read the full source: r/ClaudeAI
👀 See Also

Audacity-MCP: Claude AI Integration for Local Audio Editing with 131 Tools
Audacity-MCP connects Claude to Audacity via pipe interface, enabling voice-controlled audio editing with 131 tools, 9 automated pipelines, and local Whisper transcription without cloud dependencies.

Cowork Chrome Extension Automates Personal Data Removal from Data Brokers
A Reddit user reports that using the Cowork Chrome extension with a Gmail connection automated filling forms, writing emails, and verifying removal requests to delete personal data from major data providers in just a few hours.

NERF Open Source AI Security Engineering Platform Enters Public Beta
NERF is an open source AI security engineering platform and autonomous coding agent that covers offensive, defensive, and privacy security techniques across 117 domains. It features 9 auto-detected operating modes, 26 LLM provider support, and compliance automation for 39 frameworks.

TUI Studio: Visual Terminal UI Design Tool in Alpha
TUI Studio is a Figma-like visual editor for designing terminal user interfaces with drag-and-drop components, real-time ANSI preview, and planned export to six frameworks including Ink, BubbleTea, and Textual. Currently in alpha with non-functional exports, it's available for macOS, Windows, and Docker.