IDP Leaderboard benchmark shows Claude Sonnet 4.6 matches Opus 4.6 for document AI tasks

The IDP Leaderboard, an open benchmark for document AI, has published results comparing Claude models on document processing tasks. The benchmark tested 16 models across multiple categories using over 9,000 real documents.
Benchmark Results
The Claude model scores from the IDP Leaderboard:
- Claude Sonnet 4.6: 80.8 overall
- Claude Opus 4.6: 80.3 overall
- Claude Haiku 4.5: 69.6 overall
Sonnet and Opus performed essentially equivalently on extraction tasks including text, tables, formulas, and layout analysis. The radar charts for both models look identical according to the benchmark results.
Cost Comparison
The source notes significant cost differences:
- Sonnet costs $24 per 1,000 pages
- Opus costs $40 per 1,000 pages
For document processing workloads, the benchmark suggests there's no reason to use Opus given the equivalent performance at lower cost.
Important Caveat
One notable finding: Claude models had stricter content moderation that affected performance on certain document types. Old newspaper scans, textbook pages, and historical documents sometimes triggered content filters. This issue only appeared in the OlmOCR and OmniDoc benchmarks.
All predictions from the benchmark are visible in the Results Explorer at idp-leaderboard.org, where you can see exactly what each Claude model output on every document.
📖 Read the full source: r/ClaudeAI
👀 See Also

GitHub Claude-Code v2.1.27 Release: Key Updates and Fixes
Claude-Code v2.1.27 enhances logging and fixes several issues, including context management and OAuth token expiration in VSCode.

Polsia Platform Shows Repetitive SaaS Patterns in Live Founder Launches
Polsia is an autonomous business platform where users describe their business, pay money, and it executes autonomously. A behavioral scientist observed 72 hours of live founder launches, identifying repetitive patterns like AI SDR automation solutions and underserved international markets.

Anthropic acquires Vercept AI to advance Claude's computer use capabilities
Anthropic has acquired Vercept AI to work on computer use features for Claude. The acquisition focuses on solving perception and interaction problems to make AI more useful for complex tasks.

Hidden pet system discovered in Claude Code leak: gacha mechanics with ASCII animations
Analysis of leaked Claude Code reveals a hidden companion pet system with 18 species, rarity tiers, and ASCII animations. The system uses deterministic hashing from user IDs to generate unique pets without storing species data.