AIME 2026 Results: Both Open and Closed Models Score Above 90%

The AIME 2026 (American Invitational Mathematics Examination) results are out, and both closed and open AI models are now scoring above 90% on this challenging mathematical reasoning benchmark.
Key Highlights
- Both proprietary (closed) and open-source models exceed 90% accuracy
- DeepSeek V3.2 can run the entire test for approximately bash.09 in API costs
- This represents a significant milestone in mathematical reasoning capabilities
What This Means
AIME is traditionally one of the most challenging high school mathematics competitions, featuring problems that require sophisticated mathematical reasoning. AI models achieving 90%+ accuracy demonstrates remarkable progress in complex reasoning abilities.
Cost Efficiency
The fact that DeepSeek V3.2 can achieve competitive results at just bash.09 for the entire test highlights the rapidly decreasing cost of advanced AI capabilities, making sophisticated reasoning more accessible.
Why This Matters
The achievement of over 90% accuracy by both closed and open AI models signifies a pivotal moment in the evolution of AI technologies. It showcases the potential for AI to assist not only in educational contexts but also in real-world applications where complex problem-solving is required. This advancement may encourage further investment and development in AI systems, particularly in areas that require high-level cognitive functions.
Key Takeaways
- The performance of AI models in AIME 2026 indicates a leap in their mathematical reasoning capabilities.
- Both proprietary and open-source models are reaching similar levels of accuracy, promoting healthy competition and innovation in the AI space.
- Cost-effective solutions like DeepSeek V3.2 are making advanced AI tools more accessible to a broader audience.
- This progress could inspire educational institutions to integrate AI tools into their curricula, enhancing learning experiences.
Getting Started
For those interested in leveraging AI for mathematical reasoning or other complex tasks, starting with tools like DeepSeek V3.2 is straightforward. Users can sign up for an API key on the DeepSeek website, enabling them to access the model's capabilities. Once registered, developers can integrate the API into their applications or use it for personal projects, allowing for experimentation with AI-driven problem-solving.
Full results: matharena.ai
📖 Read the full source: r/LocalLLaMA
👀 See Also

Claude Cowork unifies slash commands and skills under single concept
Claude Cowork has unified slash commands and skills under a single concept called 'skills', eliminating separate headers in the / menu. Legacy commands continue to function as before.

OpenAI Codex OAuth returning 429 errors since March 16 despite full quota
OpenAI Codex OAuth has been consistently returning 429 "you exceeded your current quota" errors since March 16, even when dashboards show 100% quota remaining. Users report the issue persists despite re-authentication, token revocation, and complete reconfiguration.

Claude Code CC 2.1.124 and 2.1.126: File Modification Budget Exceeded Reminder, Harness Instructions Update, REPL Awaits Clarification, and Malware Analysis Reminder Removed
CC 2.1.124 adds a system reminder for file changes omitted due to budget limits, updates harness instructions with explicit insertion points, and clarifies REPL auto-await behavior. CC 2.1.126 removes the malware analysis post-read reminder.

Open Source vs Frontier Models: Single-File Canvas Car Scene Benchmark
A developer tested 12 models including GPT-5.5, Claude Opus 4.7, and Qwen 3.6 Plus on a single-file HTML canvas car driving animation task, with results publicly compared.