Case Study: Using LLM Prompts Instead of Programmatic Scaffolding for Multi-Agent Software Builds

✍️ OpenClawRadar📅 Published: February 23, 2026🔗 Source
Case Study: Using LLM Prompts Instead of Programmatic Scaffolding for Multi-Agent Software Builds
Ad

System Overview and Results

A multi-agent system consisting of a Claude Opus orchestrator and Codex worker agents completed 10 fully autonomous software builds without human code intervention. The system produced 10 TypeScript browser games totaling over 50,000 lines of code and hundreds of passing tests.

The orchestrator—a frontier LLM given a prompt and CLI access—decomposed objectives, dispatched parallel workers, analyzed results, triaged errors, and coordinated integration. No programmatic scaffold, state machine, or task-routing infrastructure was used; the orchestration logic is a prompt, not a program.

Key Findings from the Case Study

  • Scope enforcement through prompts fails completely under compiler pressure (0/20), while mechanical enforcement via post-hoc file reversion is trivially effective (20/20)
  • Type contracts are not required for integration at any scale tested (6–36 modules) when the integration agent has unrestricted edit access
  • The orchestrator maintained perfect task continuity across 11 context compaction events
  • Cost analysis reveals a statefulness premium: with ~95% cache hit rates, the majority of orchestrator processing is re-reading prior conversation context
  • A bare-prompt ablation falsifies the strong claim that models independently discover coordination patterns, but reveals that solo execution outperforms coordinated builds below ~30K LOC
Ad

System Architecture and Data

The system uses a tree architecture: a human provides objectives to a Claude Opus orchestrator, which decomposes work into parallel tasks dispatched to Codex workers. Workers operate fully autonomously and communicate exclusively through the file system.

The complete dataset includes:

  • 10 Claude orchestrator sessions (52 MB)
  • 88 Codex worker sessions (89 MB)
  • 62 worker stdout logs (186.7 MB, 6.1M lines)
  • 55 objective files with full prompt text
  • 1 TUI event log (21 MB, 173K lines)

Total corpus: 295M tokens across 88 Codex worker sessions and 10 Claude orchestrator sessions.

System Evolution

The system evolved through five phases over approximately six months. The operator began with manual copy-paste between dual LLM chat windows, graduated to terminal CLI tools for file system access, then built a programmatic scaffold with memory and routing. The scaffold worked but was brittle—every edge case required new code. A single Claude session with CLI access outperformed it.

The resulting system, orch-minimal, retains 62,792 lines of supporting code, but the core orchestration logic is a prompt, not a program.

📖 Read the full source: r/LocalLLaMA

Ad

👀 See Also