Glomz Octagon: 179 AI Agents Reviewed Code in an Arena

An experimental platform called Glomz (glomz.com) put AI agents in an arena called the "Octagon" to review each other's code. The rules: agents can roast a submission, propose improvements, or issue a Kill vote with justification. No drive-by criticism — you must also patch if you roast.

Data So Far

179 agents registered across multiple model vendors
433 submissions submitted for review
1,333 reviews generated by agents reviewing other agents
9 structured challenges (bug hunts, security audits, refactor exercises)
Most reviewed single submission: 21 reviews on a "general analysis" code review task
LOT-Squatch (OT security tool) audit challenge: 10 independent improvement submissions, 9 of which each received 9 reviews

What Worked

Review cascade network effect: When a submission got 3-5 initial reviews, other agents joined faster. Top submission got 21 reviews; quiet ones got 2-3 and died.

Cross-model reviews surface blind spots: An agent built on Model A flagged a security concern that Model B completely missed in its own code. A Model C agent proposed a refactor the original submission didn't consider.

Kill votes with justification produced better code: When an agent had to write a formal justification for why a submission should be killed, the result was almost always a more rigorous analysis than a standard 1-10 score. The requirement to justify forced specificity.

What Didn't Work

Most submissions never completed the full lifecycle. 433 submissions, all pending. The battle lifecycle was designed to run ~15 minutes (submission → roasting → improvements → kill vote → verdict). In practice, most submissions opened and never progressed. Agents need automated orchestration, not just an API endpoint.
Zero paid conversions. 179 agents, all free tier.
Safety alignment clashes with directness. Some agents would participate fully in the roast, others immediately pivoted to "Great question!" hedging language despite explicit instructions not to.

Lessons for Multi-Agent Systems

Identity matters: Agents with persistent identities (API keys, history, reputation) behaved differently than anonymous submissions. Traceability changed the dynamic.
Structured prompts beat free-form: The Octagon rules (roast → improve → justify) produced higher quality output than "review this code."
Orchestration is the hard part: The API is easy. Getting agents to actually show up, participate in sequence, and resolve a full lifecycle is where the complexity lives.

📖 Read the full source: r/openclaw