Two-Agent Haiku System Beats Claude Opus at 15x Lower Cost

Experimental Setup and Results

A Reddit user conducted a comparative test between two Claude model configurations on a challenging number theory problem. The problem required proving that for an odd prime p, the sum 1^(p-1) + 2^(p-1) + ... + (p-1)^(p-1) is congruent to -1 (mod p), using Fermat's Little Theorem and properties of primitive roots.

Two configurations were tested:

Config X (Opus solo): Claude Opus 4.5 with max_tokens: 2048, no auditor
Config Y (Haiku multi-agent): Haiku generator produces full proof, second Haiku auditor checks every step, with two passes if auditor flags anything, max_tokens: 1024 each call

Scoring and Performance

Both configurations scored 4/4 using this rubric:

Correctly invokes Fermat's Little Theorem
Correctly handles primitive root argument
Summation over complete residue system valid
Congruence conclusion follows correctly

The Haiku auditor returned VERIFIED with no disagreement. Performance metrics:

Opus solo: ~8.7 seconds, score 4/4
Haiku + auditor: ~10.9 seconds, score 4/4

Cost Analysis

The economic implications are significant:

Opus solo: $0.075/1000 tokens × ~800 tokens = ~$0.06 per query
Haiku + Haiku: $0.0025/1000 tokens × ~1600 tokens = ~$0.004 per query

This represents approximately 15x lower cost for identical results on this problem. The problem was described as "genuinely hard" and not training-data-obvious like simpler proofs.

The source notes that on clean problems where Fermat's Little Theorem does the heavy lifting (each a^(p-1) ≡ 1, sum (p-1) ones, get p-1 ≡ -1), the auditor pattern adds about a 17% time tax to confirm correctness. The pattern is particularly valuable for problems where the generator might stumble with quantization stutter or hallucinated algebra.

📖 Read the full source: r/ClaudeAI

Multi-Agent Haiku System Matches Claude Opus on Complex Number Theory Problem at 15x Lower Cost

Experimental Setup and Results

Scoring and Performance

Cost Analysis

👀 See Also

Nia-docs tool creates local filesystem from documentation URLs for Claude AI

iai-mcp: Local daemon gives Claude persistent memory across sessions with 99% recall

Agent Browser Protocol: Open-source Chrome fork for AI agents achieves 90% on Mind2Web benchmark

Understudy: A Teachable Desktop Agent That Learns Tasks by Demonstration