AIME 2026: Models Score Over 90%

The AIME 2026 (American Invitational Mathematics Examination) results are out, and both closed and open AI models are now scoring above 90% on this challenging mathematical reasoning benchmark.

Key Highlights

Both proprietary (closed) and open-source models exceed 90% accuracy
DeepSeek V3.2 can run the entire test for approximately bash.09 in API costs
This represents a significant milestone in mathematical reasoning capabilities

What This Means

AIME is traditionally one of the most challenging high school mathematics competitions, featuring problems that require sophisticated mathematical reasoning. AI models achieving 90%+ accuracy demonstrates remarkable progress in complex reasoning abilities.

Cost Efficiency

The fact that DeepSeek V3.2 can achieve competitive results at just bash.09 for the entire test highlights the rapidly decreasing cost of advanced AI capabilities, making sophisticated reasoning more accessible.

Why This Matters

The achievement of over 90% accuracy by both closed and open AI models signifies a pivotal moment in the evolution of AI technologies. It showcases the potential for AI to assist not only in educational contexts but also in real-world applications where complex problem-solving is required. This advancement may encourage further investment and development in AI systems, particularly in areas that require high-level cognitive functions.

Key Takeaways

The performance of AI models in AIME 2026 indicates a leap in their mathematical reasoning capabilities.
Both proprietary and open-source models are reaching similar levels of accuracy, promoting healthy competition and innovation in the AI space.
Cost-effective solutions like DeepSeek V3.2 are making advanced AI tools more accessible to a broader audience.
This progress could inspire educational institutions to integrate AI tools into their curricula, enhancing learning experiences.

Getting Started

For those interested in leveraging AI for mathematical reasoning or other complex tasks, starting with tools like DeepSeek V3.2 is straightforward. Users can sign up for an API key on the DeepSeek website, enabling them to access the model's capabilities. Once registered, developers can integrate the API into their applications or use it for personal projects, allowing for experimentation with AI-driven problem-solving.

Full results: matharena.ai

📖 Read the full source: r/LocalLLaMA