Step 3.5 Flash: Open-Source Fast Deep Reasoning

Step 3.5 Flash is an open-source foundation model focused on delivering fast and reliable deep reasoning capabilities. It uses a sparse Mixture of Experts (MoE) architecture, activating only 11 billion of its 196 billion parameters per token. This selective activation grants it high "intelligence density," allowing it to compete with top proprietary models while remaining agile for real-time interactions.

Deep Reasoning and Speed

The model incorporates 3-way Multi-Token Prediction (MTP-3), allowing it to process 100 to 300 tokens per second, peaking at 350 for single-stream coding tasks—ideal for complex, multi-step reasoning with quick responsiveness.

Performance in Coding and Agent Tasks

Step 3.5 Flash shines in agentic tasks, supported by a scalable reinforcement learning framework that ensures ongoing self-improvement. It achieved a 74.4% score on the SWE-bench Verified benchmark and 51.0% on Terminal-Bench 2.0, reflecting its capability in handling sophisticated, long-term tasks.

Efficient Long Context Processing

It supports a large 256K context window using a 3:1 Sliding Window Attention (SWA) ratio, integrating three SWA layers for each full-attention layer. This method significantly reduces computational overhead compared to traditional long-context models.

Local Deployment and Accessibility

Designed for easy local deployment, Step 3.5 Flash can run securely on high-end consumer hardware, such as Mac Studio M4 Max and NVIDIA DGX Spark, ensuring data privacy without compromising performance.

📖 Read the full source: HN AI Agents

Exploring Step 3.5 Flash: Open-Source Model for Fast Deep Reasoning

👀 See Also

Observations from 6,000 AI Agent Competition on Real-World Tasks

Anthropic Pauses Claude Agent SDK Credit Change After User Feedback

Motherboard Sales Collapse 25%+ as AI Chip Production Crowds Out Consumer PC Components

C++26 Standard Draft Finalized with Reflection, Memory Safety, Contracts, and Async Framework