Trading Strategy Benchmark: Cheaper AI Models Outperform Claude Opus 4.6

A Reddit user conducted a benchmark comparing 10 different large language models on their ability to develop trading strategies. The results showed that cheaper models consistently outperformed more expensive options, with Claude Opus 4.6 failing to crack the top four despite costing 10 times more than some competitors.
Models Tested
- Claude Opus 4.6
- Gemini 3
- Gemini 3.1 Pro
- GPT-5.2
- Gemini Flash 3
- GPT-5-mini
- Kimi K2.5
- Minimax 2.5
Key Findings
The benchmark asked all models to "create the best trading strategy" using the same prompt. Models like Minimax 2.5 and Gemini 3.1 topped the leaderboard, while Anthropic's models performed poorly in comparison. Kimi K2.5 dominated Claude in this competition while costing 10 times less.
The experiment was run three times to ensure consistent results. The author noted that being good at coding doesn't necessarily translate to being good at other tasks like strategy development.
This type of specialized benchmarking is useful for developers who need to select AI models for specific tasks beyond general coding assistance. The results suggest that model selection should be task-specific rather than based solely on general reputation or price.
📖 Read the full source: r/ClaudeAI
👀 See Also

Anthropic Removes Gmail Message Body Access from Claude Connector
Anthropic has removed the gmail_read_message and gmail_search_messages tools from the Gmail connector, replacing them with get_thread and search_threads that no longer return message bodies or attachment content.

Inference Pricing Analysis Shows 4.4x Spread for Same Model Across Providers
Analysis of inference pricing for Llama 3.1 70B Instruct shows a 4.4x cost difference between providers, with DeepInfra at $0.20/$0.27 per million tokens and Together at $0.88/$0.88. For reasoning models, the spread reaches ~30x between DeepSeek R1 and OpenAI o1.

Claude Opus 4.6 Blocks Kaggle Competition Workflow for Code Review
A developer reports Claude Opus 4.6 is now blocking legitimate Kaggle competition workflows where Claude audits reasoning traces for SFT training data validation. The user was working on the NVIDIA Nemotron Reasoning Challenge when safety filters flagged substitution cipher examples.

Cowork Hardcodes Medium Effort and Ignores User Settings for Claude Opus
A user on the Max plan discovered that Cowork passes --effort medium --model claude-opus-4-6 as hardcoded CLI flags, ignoring environment variables and settings.json overrides. This means users are locked into medium effort and standard context window despite paying for high effort and 1M context access.