IDP Leaderboard benchmark shows Claude Sonnet 4.6 matches Opus 4.6 for document AI tasks

The IDP Leaderboard, an open benchmark for document AI, has published results comparing Claude models on document processing tasks. The benchmark tested 16 models across multiple categories using over 9,000 real documents.
Benchmark Results
The Claude model scores from the IDP Leaderboard:
- Claude Sonnet 4.6: 80.8 overall
- Claude Opus 4.6: 80.3 overall
- Claude Haiku 4.5: 69.6 overall
Sonnet and Opus performed essentially equivalently on extraction tasks including text, tables, formulas, and layout analysis. The radar charts for both models look identical according to the benchmark results.
Cost Comparison
The source notes significant cost differences:
- Sonnet costs $24 per 1,000 pages
- Opus costs $40 per 1,000 pages
For document processing workloads, the benchmark suggests there's no reason to use Opus given the equivalent performance at lower cost.
Important Caveat
One notable finding: Claude models had stricter content moderation that affected performance on certain document types. Old newspaper scans, textbook pages, and historical documents sometimes triggered content filters. This issue only appeared in the OlmOCR and OmniDoc benchmarks.
All predictions from the benchmark are visible in the Results Explorer at idp-leaderboard.org, where you can see exactly what each Claude model output on every document.
📖 Read the full source: r/ClaudeAI
👀 See Also

Fine-tuning Phi-4-mini by training only LayerNorm parameters fails to improve performance
A hobbyist tested training only LayerNorm γ values on Phi-4-mini across Python and medical domains with different learning rates and data formats. Performance degraded slightly on all benchmarks compared to baseline, with the author concluding transformers already route information dynamically through attention.
Claude Code v2.1.140: Agent tool usage notes, stricter Self-Modification rules, Snooze warnings
Agent tool simplified notes, explicit Self-Modification path list, and a warning against short-interval snooze wakeups for polling.

Claude Code v2.1.83 adds managed settings fragments, transcript search, and security improvements
Claude Code v2.1.83 introduces a managed-settings.d/ directory for team policy fragments, transcript search with / and n/N navigation, and CLAUDE_CODE_SUBPROCESS_ENV_SCRUB=1 to strip credentials from subprocess environments. The release also includes CwdChanged/FileChanged hooks, sandbox.failIfUnavailable setting, and fixes for macOS exit hangs, UI freezes, and memory leaks.

Supreme Court Declines Review, AI-Generated Art Remains Uncopyrightable
The US Supreme Court declined to hear a case on copyrighting AI-generated art, letting stand lower court rulings that require 'human authorship' for copyright protection. This follows the Copyright Office's 2022 rejection of Stephen Thaler's request to copyright an image created by his algorithm.