Practical Findings from 11 Multi-Agent Software Builds Without Programmatic Scaffolding

Key Technical Findings from Multi-Agent System Experiments
Analysis of 11 autonomous multi-agent software builds without programmatic scaffolding, based on 295M tokens, 98 agent sessions, and 6.1M lines of worker output, reveals practical insights for developers working with AI coding agents.
Scope Enforcement and Orchestration
Scope enforcement is solved mechanically, not through prompts: Prompt-based approaches failed 0/20 times under compiler pressure, while mechanical approaches (letting agents edit everything and using git revert for out-of-scope files) succeeded 20/20 times. The key insight: don't ask models to respect boundaries—enforce them after the fact.
Orchestrator costs are memory-bound: Approximately 95% of input spend is re-reading conversation history. The "statefulness premium" means a frontier orchestrator that writes zero shipped code can cost as much as the entire worker fleet. Optimization should target fewer turns and less re-ingestion, not cheaper reasoning.
Coordination and Scaling Dynamics
Models don't independently discover coordination: Bare-prompt Opus with full tool access never delegated, never wrote specs, and never discovered parallel dispatch—it just built everything solo. The coordination template does real work.
Depth scales differently than quality: Flat dispatch beats hierarchy at ≤10 domains on throughput, token efficiency, and wall time. Above 10 domains, hierarchy enables parallelism that flat dispatch can't achieve.
Solo outperforms coordination until context limits bind: Solo throughput is approximately 325 LOC/min and invariant to project size. Pyramid throughput scales with workers. Below ~30K LOC, delegation is pure overhead.
Worker Performance and Type Systems
Worker model capability drives throughput: Same architecture, same spec, three worker models produced: 17,761 LOC vs 6,001 vs 1,818—a 9.8x gap. Architecture enables parallel throughput; the worker model determines it.
Type contracts provide shared vocabulary: Integration succeeds without contracts at every scale tested (6–36 modules), even under read-only constraints. But without contracts, parallel workers silently produce structurally incompatible types that compile only because nothing cross-references. A single 984-line contract written blind held across 10 independent domains.
Type contracts eliminate coordination overhead at scale: Controlled scaling test (1–20 workers, fixed spec) showed zero integration errors across 50 domain builds. Sweet spot at 10 workers: 2.05x wall-time speedup. At 20 workers, serial phase dependencies negate parallelism gains (Amdahl's serial fraction ~44%).
Context and Delegation Patterns
Context priming works; format doesn't matter: 0% formula transfer cold, 100% with design context present (N=10 per condition). A static reference document produces identical transfer rates to a synthetic boot conversation.
Delegation compression is inherent: Each delegation layer acts as a lossy summarizer. Quantitative requirements ("80 weapons") vanish; structural requirements (type interfaces) survive. Fix: workers should read full specs from the filesystem rather than relying on compressed prompt chains.
Compaction recovery is robust with good summaries: Zero task relapse across 11 compaction events. The model states expected state, then reads disk to verify.
Failure Modes and Fixes
- Abstraction reflex: Builds an orchestrator instead of orchestrating—name it in the prompt
- Self-model error: Claims false capabilities—document available tools explicitly
- Identity paradox: Can't hold dual roles—use separate model instances
- Delegation compression: Use enumerative specs plus filesystem access
📖 Read the full source: r/ClaudeAI
👀 See Also

Dynamic Status Bar for Claude Code Shows Live Updates
A developer has improved their Claude Code status bar from static text to dynamic display with real-time updates showing what Claude is working on. The configuration is available as a GitHub gist.

Manifest Adds MiniMax Token Plans with M2.7 Model Support
Manifest, an open source routing layer for OpenClaw, now supports MiniMax token plans starting at $10/month. The new MiniMax M2.7 model is specifically built for OpenClaw workflows and achieves 62.7 on MM-ClawBench and 56.2 on SWE-Bench Pro.

Offline-web-search: A Local Google Search Alternative for AI Agents
A developer built offline-web-search to address poor offline search capabilities in AI agents, creating a drop-in replacement that mimics Claude's web tools with BM25 ranking, SQLite FTS5 indexing, and support for ZIM archives and custom crawlers.

nex-life-logger: Local Activity Tracker for OpenClaw Agents
nex-life-logger is a background activity tracker that runs locally on your machine, giving OpenClaw agents memory of your computer activities. It tracks browser history, active windows, and YouTube transcripts, storing everything in a local SQLite database with no cloud data transmission.