inclusionAI Releases Ling-2.6-1T: Hybrid Architecture Trillion-Parameter Model with Sparse Attention and Fast Thinking

inclusionAI has open-sourced Ling-2.6-1T, a trillion-parameter flagship model from the Ling family, targeting complex real-world tasks. The model introduces a hybrid architecture combining Multi-head Latent Attention (MLA) and Linear Attention to improve inference efficiency, lowering latency and VRAM usage for long contexts while keeping expressivity.
Fast Thinking via Reward Strategy
Post-training uses a Contextual Process Redundancy Suppression reward strategy, which encourages shorter, direct outputs — a "fast thinking" mechanism that reduces reliance on verbose chains-of-thought. This cuts token overhead while maintaining performance.
Benchmark SOTA
Ling-2.6-1T achieves open-source SOTA on execution-heavy benchmarks:
- AIME26 (reasoning)
- SWE-bench Verified (software engineering)
- BFCL-V4 (function calling)
- TAU2-Bench (task completion)
- IFBench (instruction following)
Agent Integration
The model is designed for end-to-end engineering workflows — from code generation to bug fixing — and integrates with mainstream agent frameworks including Claude Code, OpenClaw, OpenCode, and CodeBuddy. It handles multi-tool, multi-step constraints in enterprise environments.
📖 Read the full source: r/LocalLLaMA
👀 See Also

DMA Delays Siri AI on iOS 27 and iPadOS 27 in EU — Available on macOS and visionOS
Apple announced Siri AI is delayed on iOS 27 and iPadOS 27 in the EU due to DMA. macOS 27 and visionOS 27 will have Siri AI in the EU. The Trusted System Agent proposal was rejected.

AI-generated code volume is overwhelming senior engineers, study shows
AI users merge 98% more pull requests with AI assistance, but senior engineers report increased cognitive load and burnout. Research shows defect detection drops from 87% for PRs under 100 lines to 28% for PRs over 1,000 lines.

Apple Builds New AI Architecture on Google Gemini Foundation Models
Apple announced a major overhaul of Apple Intelligence, built on foundation models co-developed with Google using Gemini technology. The new architecture includes an orchestrator, on-device and server-side models, and multimodal capabilities.

Micron's $200B Investment Aimed at AI Memory Constraints
Micron commits $200 billion towards addressing AI memory bottlenecks, aiming to enhance AI processing capabilities.