Exploring Step 3.5 Flash: Open-Source Model for Fast Deep Reasoning

Step 3.5 Flash is an open-source foundation model focused on delivering fast and reliable deep reasoning capabilities. It uses a sparse Mixture of Experts (MoE) architecture, activating only 11 billion of its 196 billion parameters per token. This selective activation grants it high "intelligence density," allowing it to compete with top proprietary models while remaining agile for real-time interactions.
Deep Reasoning and Speed
The model incorporates 3-way Multi-Token Prediction (MTP-3), allowing it to process 100 to 300 tokens per second, peaking at 350 for single-stream coding tasks—ideal for complex, multi-step reasoning with quick responsiveness.
Performance in Coding and Agent Tasks
Step 3.5 Flash shines in agentic tasks, supported by a scalable reinforcement learning framework that ensures ongoing self-improvement. It achieved a 74.4% score on the SWE-bench Verified benchmark and 51.0% on Terminal-Bench 2.0, reflecting its capability in handling sophisticated, long-term tasks.
Efficient Long Context Processing
It supports a large 256K context window using a 3:1 Sliding Window Attention (SWA) ratio, integrating three SWA layers for each full-attention layer. This method significantly reduces computational overhead compared to traditional long-context models.
Local Deployment and Accessibility
Designed for easy local deployment, Step 3.5 Flash can run securely on high-end consumer hardware, such as Mac Studio M4 Max and NVIDIA DGX Spark, ensuring data privacy without compromising performance.
📖 Read the full source: HN AI Agents
👀 See Also

Developer Perspectives on AI Anxiety and 'AI Psychosis'
A Reddit discussion reveals widespread anxiety among developers using AI tools, with different age groups experiencing distinct pressures: 35-45 year olds feel constant reinvention pressure, 25-35 year olds worry about skills becoming obsolete, and under-25 developers face burnout risks despite AI fluency.

OpenClaw agent spending patterns and lack of spending caps
A developer tracked OpenClaw agent spending over two months and found most agents average $40–$80/month in API and service charges when left unchecked, with spikes occurring on weekends and overnight. The default behavior is unlimited with no built-in spending cap.

From Prompting to Specification Engineering: The Planner-Worker Architecture Shift
AI development is shifting from simple chat-based prompting to a planner-worker architecture where humans act as specification engineers. This requires defining strict acceptance criteria, constraint architecture, and decomposition patterns for autonomous AI agents.

Microsoft releases Phi-4-reasoning-vision-15B multimodal model with training insights
Microsoft Research has released Phi-4-reasoning-vision-15B, a 15 billion parameter open-weight multimodal reasoning model available through Microsoft Foundry, HuggingFace, and GitHub. The model balances reasoning power with efficiency and excels at math/science reasoning and UI understanding.