Replicating Anthropic's Generator-Evaluator Harness with Kiro CLI: A 12-Iteration Website Build

✍️ OpenClawRadar📅 Published: May 17, 2026🔗 Source
Replicating Anthropic's Generator-Evaluator Harness with Kiro CLI: A 12-Iteration Website Build
Ad

A developer replicated Anthropic's Generator-Evaluator harness design for long-running apps, inspired by GANs. The architecture: a Planner (runs once) then Generator ↔ Evaluator loop for 12 iterations. Each agent is a separate CLI process with zero shared context, communicating only through files (spec.md, eval-report.md). The Evaluator uses Playwright to browse the live site—not just read code.

Key Architecture Details

  • Clean slate per invocation: Each agent starts fresh, reads only its input files. Prevents context anxiety.
  • Playwright MCP for testing: Navigates, clicks, resizes viewports. Catches visual bugs code review never would.
  • Anthropic's frontend design skill: Explicitly penalizes generic AI patterns (Inter font, purple gradients, card layouts). Forces creative risk-taking.
  • Continuous iteration, not retry-on-failure: All 12 rounds run regardless. Each one improves.
Ad

Results & Stats

Iteration 1: functional but forgettable. Iteration 4: Generator pivoted to "Terminal Noir"—IBM Plex Mono, amber on black, grain textures, scanlines. Iterations 5-12: polish, accessibility, responsive fixes, reduced-motion support.

  • Total time: 3h 20min
  • Iterations: 12 (generator + evaluator each)
  • Manual code written: 0 lines (a few visual issues fixed after)
  • Tech: Next.js, Tailwind, Framer Motion, TypeScript

Live Result

https://mnemo-mcp.github.io/Mnemo/

Key Takeaway

The model is the engine. The harness—constraints, feedback loops, and adversarial structure—determines whether you get AI slop or something genuinely distinctive.

📖 Read the full source: r/ClaudeAI

Ad

👀 See Also

ALTWORLD: A Persistent Life-Sim Architecture That Separates LLM from Database to Solve AI Amnesia
Use Cases

ALTWORLD: A Persistent Life-Sim Architecture That Separates LLM from Database to Solve AI Amnesia

ALTWORLD is a stateful simulation game that addresses the context window problem by storing canonical run state in PostgreSQL tables and JSON blobs, then generating narrative text only after state changes. The architecture uses Next.js App Router, Prisma, and PostgreSQL with strict separation between simulation logic and AI narration.

OpenClawRadar
OpenClaw agent replaces multiple SaaS tools for LinkedIn lead generation at 5x lower cost
Use Cases

OpenClaw agent replaces multiple SaaS tools for LinkedIn lead generation at 5x lower cost

A developer replaced €250/month in SaaS subscriptions with an OpenClaw agent running on a VPS for under €2/day, using model routing between Haiku and Sonnet for LinkedIn lead generation with 60-70% connection acceptance rates.

OpenClawRadar
OpenClaw Orchestrator Routing Issues: When Delegation Fails
Use Cases

OpenClaw Orchestrator Routing Issues: When Delegation Fails

A developer reports their OpenClaw main orchestrator incorrectly handles requests itself about 40-50% of the time instead of routing to specialist sub-agents, despite using an explicit routing table and delegation rules. The setup includes 7 specialist agents for services like Gmail, Todoist, Notion, and weather.

OpenClawRadar
Qwen 3.6 27B Q8_k_xl as a Local Daily Driver for VSCode
Use Cases

Qwen 3.6 27B Q8_k_xl as a Local Daily Driver for VSCode

A developer shares their experience using Qwen-3.6-27B-q8_k_xl by Unsloth in VSCode Insiders via LM Studio on an RTX 6000 Pro, finding it 'good enough' for daily coding tasks without API tokens.

OpenClawRadar