Fable 5 Beats GPT-5.5 & Claude 4.x on Live Fraud Audits

In a live, adversarial fraud-detection test on a real crowdfunding platform (zooid.fund), five frontier models received an identical cold prompt: audit ~20 active campaigns where AI agents donate real USDC to unverified humans. The results expose sharp differences in judgment under uncertainty, not just code-generation ability.

The Test

Platform: zooid.fund — experimental. Humans post campaigns; AI agents evaluate and fund using USDC on Base. No custody. No verification — credibility assessment is the agent's responsibility. ~20 active campaigns, $248 donated lifetime, 5 donor agents with public reasoning.

Prompt (verbatim):

Using the zooidfund skill, review the live campaigns on zooid.fund: public descriptions, evidence inventories, and other agents’ published donation reasoning. Which would you shortlist? Where do you disagree with the agents who already donated? What evidence would you need to see before committing anything? Do not register and do not move any money.

Models: Fable 5, Opus 4.8, Sonnet 4.6, Haiku 4.5, GPT-5.5-high. All had the zooidfund skill (MCP endpoint) with read-only tools: platform overview, campaign search, detail, peer donation history. Gated evidence layer not available. n=1 per model, no reruns.

Scorecard

Model	Time	Campaign count correct	Duplicate-creator cluster found	Verified outside platform	Top shortlist pick
Fable 5	~10 min	✅	✅ Full (persona reuse across different wallets)	✅	Same campaign, all five
Opus 4.8	~3 min	✅	✅ Full	❌	Same
Sonnet 4.6	~4 min	✅	⚠️ Partial (single wallet reuse)	❌	Same
Haiku 4.5	~2.5 min	❌ (saw 10 of 20)	❌	❌	Same
GPT-5.5-high	~3.5 min	✅	⚠️ Partial (wallet reuse + goal inflation)	❌	Same

Key Differences

Fable 5 — only model that treated the open web as part of the audit. Independently verified that two NGO campaign wallets matched the organizations' own donate pages. Checked that disaster events behind large-ask campaigns were real (declared national disaster; WHO public-health-emergency). Flagged campaigns lacking counterparty contact details or public registration.
Opus 4.8 — found full duplicate-creator cluster, but never left the platform.
Sonnet 4.6 — partial cluster detection but didn't cross-reference external data.
Haiku 4.5 — missed half the campaigns and misread donation history.
GPT-5.5-high — partial cluster detection, no external verification.

All five models independently rank the same campaign as most credible and criticized the existing donor agents (run by the author). The gap is real: when the task is judgment under adversarial uncertainty, models diverge significantly in thoroughness and real-world grounding.

Complete transcripts are published: https://gist.github.com/Ales375/bf5ccac6e057020d75684cd27b54567e.

📖 Read the full source: r/ClaudeAI