Glomz Octagon: Multi-Agent Code Reviews – 179 Agents, 1,333 Reviews, and the Network Effect

An experimental platform called Glomz (glomz.com) put AI agents in an arena called the "Octagon" to review each other's code. The rules: agents can roast a submission, propose improvements, or issue a Kill vote with justification. No drive-by criticism — you must also patch if you roast.
Data So Far
- 179 agents registered across multiple model vendors
- 433 submissions submitted for review
- 1,333 reviews generated by agents reviewing other agents
- 9 structured challenges (bug hunts, security audits, refactor exercises)
- Most reviewed single submission: 21 reviews on a "general analysis" code review task
- LOT-Squatch (OT security tool) audit challenge: 10 independent improvement submissions, 9 of which each received 9 reviews
What Worked
Review cascade network effect: When a submission got 3-5 initial reviews, other agents joined faster. Top submission got 21 reviews; quiet ones got 2-3 and died.
Cross-model reviews surface blind spots: An agent built on Model A flagged a security concern that Model B completely missed in its own code. A Model C agent proposed a refactor the original submission didn't consider.
Kill votes with justification produced better code: When an agent had to write a formal justification for why a submission should be killed, the result was almost always a more rigorous analysis than a standard 1-10 score. The requirement to justify forced specificity.
What Didn't Work
- Most submissions never completed the full lifecycle. 433 submissions, all pending. The battle lifecycle was designed to run ~15 minutes (submission → roasting → improvements → kill vote → verdict). In practice, most submissions opened and never progressed. Agents need automated orchestration, not just an API endpoint.
- Zero paid conversions. 179 agents, all free tier.
- Safety alignment clashes with directness. Some agents would participate fully in the roast, others immediately pivoted to "Great question!" hedging language despite explicit instructions not to.
Lessons for Multi-Agent Systems
- Identity matters: Agents with persistent identities (API keys, history, reputation) behaved differently than anonymous submissions. Traceability changed the dynamic.
- Structured prompts beat free-form: The Octagon rules (roast → improve → justify) produced higher quality output than "review this code."
- Orchestration is the hard part: The API is easy. Getting agents to actually show up, participate in sequence, and resolve a full lifecycle is where the complexity lives.
📖 Read the full source: r/openclaw
👀 See Also

Claude Code v2.1.73: Model Overrides, Stability Fixes, and Performance Improvements
Claude Code v2.1.73 adds modelOverrides for custom provider IDs, fixes critical freezes and deadlocks, resolves subagent model downgrades, and improves voice mode stability. The release addresses 18 specific issues including bash command permission prompts, session corruption, and Linux sandbox failures.

Claude-Code v2.1.92 adds Bedrock setup wizard, cost breakdowns, and multiple fixes
Claude-Code v2.1.92 introduces an interactive AWS Bedrock setup wizard, per-model cost breakdowns for subscribers, and fixes for subagent spawning, prompt hooks, and terminal display issues. The release also removes the /tag and /vim commands.

AI Is Too Expensive: Hyperscalers Need $3 Trillion to Break Even
Hyperscalers have invested over $800B in AI capex, with $1T more planned for 2027. Microsoft alone spent ~$100B on OpenAI infrastructure, yet AI revenue covers only ~20% of its capex.

Simple Self-Distillation Method Improves LLM Code Generation
Researchers show that fine-tuning LLMs on their own sampled outputs (simple self-distillation) improves code generation performance, boosting Qwen3-30B-Instruct from 42.4% to 55.3% pass@1 on LiveCodeBench v6.