Ouroboros Workflow Tops AI DES Benchmark

A Reddit post shares results from the new AI-assisted Discrete-Event Simulation (DES) benchmark. The submission using the Ouroboros workflow (ooo) inside Claude Code ranked #1, beating both Claude's built-in plan mode and the 'superpowers' fat-skill stacks.

Benchmark details

The benchmark tests full understanding of a real-world system — a mining haulage system with trucks, loading points, dumping points, routes, and queues. Submissions are judged on:

Comprehension of system structure
Abstracting into a discrete-event simulation model
Designing events, state changes, and KPIs
Producing executable simulation code
Interpreting results (bottlenecks, throughput, waiting times)
Generating human-readable artifacts (topology diagrams, animations)

Ouroboros performance

The Ouroboros submission included working DES code, a topology diagram of the mining system, and an animation of trucks hauling ore. Notably, when the MCP server failed mid-run, Ouroboros fell back to a skills-based path and finished the task — demonstrating recovery and rerouting in real deployments.

Comparison

Plan mode (lightweight planning) — decent baseline
Superpowers / fat-skill stacks — worse than plan mode on this task
Ouroboros (structured: clarify → plan → execute → evaluate → recover → iterate) — best

The result suggests that structuring the workflow around problem definition, planning, execution, evaluation, and recovery is more effective than piling on more instructions and bigger skills.

Ouroboros: https://github.com/Q00/ouroboros
Benchmark: https://simulation-bench.fly.dev/

📖 Read the full source: r/ClaudeAI