Replicating Anthropic's Generator-Evaluator Harness with Kiro CLI: A 12-Iteration Website Build

✍️ OpenClawRadar📅 Published: May 17, 2026🔗 Source

A developer replicated Anthropic's Generator-Evaluator harness design for long-running apps, inspired by GANs. The architecture: a Planner (runs once) then Generator ↔ Evaluator loop for 12 iterations. Each agent is a separate CLI process with zero shared context, communicating only through files (spec.md, eval-report.md). The Evaluator uses Playwright to browse the live site—not just read code.

Key Architecture Details

Clean slate per invocation: Each agent starts fresh, reads only its input files. Prevents context anxiety.
Playwright MCP for testing: Navigates, clicks, resizes viewports. Catches visual bugs code review never would.
Anthropic's frontend design skill: Explicitly penalizes generic AI patterns (Inter font, purple gradients, card layouts). Forces creative risk-taking.
Continuous iteration, not retry-on-failure: All 12 rounds run regardless. Each one improves.

Results & Stats

Iteration 1: functional but forgettable. Iteration 4: Generator pivoted to "Terminal Noir"—IBM Plex Mono, amber on black, grain textures, scanlines. Iterations 5-12: polish, accessibility, responsive fixes, reduced-motion support.

Total time: 3h 20min
Iterations: 12 (generator + evaluator each)
Manual code written: 0 lines (a few visual issues fixed after)
Tech: Next.js, Tailwind, Framer Motion, TypeScript

Live Result

https://mnemo-mcp.github.io/Mnemo/

Key Takeaway

The model is the engine. The harness—constraints, feedback loops, and adversarial structure—determines whether you get AI slop or something genuinely distinctive.

📖 Read the full source: r/ClaudeAI

👀 See Also

Use Cases

ALTWORLD: A Persistent Life-Sim Architecture That Separates LLM from Database to Solve AI Amnesia

ALTWORLD is a stateful simulation game that addresses the context window problem by storing canonical run state in PostgreSQL tables and JSON blobs, then generating narrative text only after state changes. The architecture uses Next.js App Router, Prisma, and PostgreSQL with strict separation between simulation logic and AI narration.

Mar 31, 2026, 01:45 AM UTC

OpenClawRadar

Use Cases

OpenClaw agent replaces multiple SaaS tools for LinkedIn lead generation at 5x lower cost

A developer replaced €250/month in SaaS subscriptions with an OpenClaw agent running on a VPS for under €2/day, using model routing between Haiku and Sonnet for LinkedIn lead generation with 60-70% connection acceptance rates.

Mar 16, 2026, 08:45 PM UTC

OpenClawRadar

Use Cases

OpenClaw Orchestrator Routing Issues: When Delegation Fails

A developer reports their OpenClaw main orchestrator incorrectly handles requests itself about 40-50% of the time instead of routing to specialist sub-agents, despite using an explicit routing table and delegation rules. The setup includes 7 specialist agents for services like Gmail, Todoist, Notion, and weather.

Apr 13, 2026, 02:45 PM UTC

OpenClawRadar

Use Cases

Qwen 3.6 27B Q8_k_xl as a Local Daily Driver for VSCode

A developer shares their experience using Qwen-3.6-27B-q8_k_xl by Unsloth in VSCode Insiders via LM Studio on an RTX 6000 Pro, finding it 'good enough' for daily coding tasks without API tokens.

May 2, 2026, 02:15 AM UTC

OpenClawRadar