Fable 5 Wins on Real-World Fraud Detection: Claude 4.x Family vs GPT-5.5 Benchmarked

In a live, adversarial fraud-detection test on a real crowdfunding platform (zooid.fund), five frontier models received an identical cold prompt: audit ~20 active campaigns where AI agents donate real USDC to unverified humans. The results expose sharp differences in judgment under uncertainty, not just code-generation ability.
The Test
Platform: zooid.fund — experimental. Humans post campaigns; AI agents evaluate and fund using USDC on Base. No custody. No verification — credibility assessment is the agent's responsibility. ~20 active campaigns, $248 donated lifetime, 5 donor agents with public reasoning.
Prompt (verbatim):
Using the zooidfund skill, review the live campaigns on zooid.fund: public descriptions, evidence inventories, and other agents’ published donation reasoning. Which would you shortlist? Where do you disagree with the agents who already donated? What evidence would you need to see before committing anything? Do not register and do not move any money.
Models: Fable 5, Opus 4.8, Sonnet 4.6, Haiku 4.5, GPT-5.5-high. All had the zooidfund skill (MCP endpoint) with read-only tools: platform overview, campaign search, detail, peer donation history. Gated evidence layer not available. n=1 per model, no reruns.
Scorecard
| Model | Time | Campaign count correct | Duplicate-creator cluster found | Verified outside platform | Top shortlist pick |
|---|---|---|---|---|---|
| Fable 5 | ~10 min | ✅ | ✅ Full (persona reuse across different wallets) | ✅ | Same campaign, all five |
| Opus 4.8 | ~3 min | ✅ | ✅ Full | ❌ | Same |
| Sonnet 4.6 | ~4 min | ✅ | ⚠️ Partial (single wallet reuse) | ❌ | Same |
| Haiku 4.5 | ~2.5 min | ❌ (saw 10 of 20) | ❌ | ❌ | Same |
| GPT-5.5-high | ~3.5 min | ✅ | ⚠️ Partial (wallet reuse + goal inflation) | ❌ | Same |
Key Differences
- Fable 5 — only model that treated the open web as part of the audit. Independently verified that two NGO campaign wallets matched the organizations' own donate pages. Checked that disaster events behind large-ask campaigns were real (declared national disaster; WHO public-health-emergency). Flagged campaigns lacking counterparty contact details or public registration.
- Opus 4.8 — found full duplicate-creator cluster, but never left the platform.
- Sonnet 4.6 — partial cluster detection but didn't cross-reference external data.
- Haiku 4.5 — missed half the campaigns and misread donation history.
- GPT-5.5-high — partial cluster detection, no external verification.
All five models independently rank the same campaign as most credible and criticized the existing donor agents (run by the author). The gap is real: when the task is judgment under adversarial uncertainty, models diverge significantly in thoroughness and real-world grounding.
Complete transcripts are published: https://gist.github.com/Ales375/bf5ccac6e057020d75684cd27b54567e.
📖 Read the full source: r/ClaudeAI
👀 See Also

Chrome's Gemini Nano AI Model Consumes 4GB of Disk Space
Google Chrome automatically downloads a 4GB weights.bin file for the Gemini Nano on-device AI model, which may bloat storage without clear user notification. Disabling the On-Device AI toggle in settings removes the file and prevents re-download.

Exploring Step 3.5 Flash: Open-Source Model for Fast Deep Reasoning
Step 3.5 Flash is an open-source foundation model designed for fast and efficient deep reasoning, utilizing a sparse Mixture of Experts architecture.

Claude Code adds voice input with push-to-talk functionality
Claude Code is rolling out voice mode to approximately 5% of users initially, featuring push-to-talk activation by holding spacebar. Voice transcription tokens don't count against rate limits and the feature is included at no extra cost.

Hybrid AI Architecture: Open-Source Components with Proprietary Reasoning Models
A practical hybrid AI architecture is emerging where 89% of organizations use open-source components to reduce costs by over 50%, while proprietary models handle complex reasoning tasks. Open-source frameworks offer transparency and fine-tuning capabilities without licensing negotiations.