Open-source LLMs outperform Claude Opus 4.6 in trading strategy generation at lower cost

A Reddit user on r/LocalLLaMA conducted a comparative test of 10 different large language models to evaluate their performance in generating trading strategies. The results challenge assumptions about cost-performance relationships in commercial LLMs.
Test methodology and models
The user launched 10 LLMs with the same prompt: "create the best trading strategy." The tested models included:
- Claude Opus 4.6
- Gemini 3, 3.1 Pro, and GPT-5.2
- Gemini Flash 3, GPT-5-mini, Kimi K2.5, and Minimax 2.5
The test was run three times to verify consistency of results.
Key findings
According to the source:
- Minimax 2.5 and Gemini 3.1 topped the leaderboard
- Anthropic's models (including Opus 4.6) performed "lackluster" and didn't crack the top 4
- Claude Opus 4.6 cost 10x more than competing models
- Open-source models were much slower than Anthropic and Google models
The user noted initial skepticism about the results, stating: "Honestly, I didn't believe the results the first time I did this." After verification, they concluded: "The results are legit."
Practical implications
For developers using AI coding agents, this suggests that for certain specialized tasks like trading strategy generation, open-source models may offer better performance at significantly lower cost. The main trade-off noted is speed - open-source models were described as "much slower" than commercial alternatives from Anthropic and Google.
The user's conclusion was direct: "other than that, there's not a great reason to use Opus or Sonnet for this task."
📖 Read the full source: r/LocalLLaMA
👀 See Also

Linux kernel maintainer reports sudden shift in AI-generated bug report quality
Greg Kroah-Hartman says AI-generated bug reports for the Linux kernel went from 'AI slop' to legitimate reports about a month ago, with open source security teams across projects seeing the same shift. The kernel team is handling the increase with tools like Sashiko for review automation.

Bonsai 1.7B Ternary Model Hits 442 T/s on M4 Max with Autonomously Tuned Metal Kernels
Autonomous agent ata optimized Metal kernels for Bonsai 1.7B Q2_0, achieving 442 t/s decode (+42%) and 4622 t/s prefill (+9%) on M4 Max vs unmodified llama.cpp.

Practical Enhancements in Claude Opus 4.6: Memory Upgrade
Claude Opus 4.6 features a significant upgrade with a 1 million token context, enhancing memory retention and performance in complex tasks.

Domo CDO: Stop AI FOMO, Start with Spreadsheets
Domo chief design officer Chris Willis argues AI is being sold without a spec, creating fear-driven 'tokenmaxxing' theater. His fix: start by automating a spreadsheet process, not chasing moonshots.