Cerebras releases Step-3.5-Flash-REAP models with 40% memory reduction

✍️ OpenClawRadar📅 Published: February 25, 2026🔗 Source

What this is

Cerebras has released Step-3.5-Flash-REAP models, which are memory-efficient compressed variants of their larger models. These are smaller versions designed for what the source calls "potato setups," though the 121B parameter model still requires significant resources.

Key details from the source

The models are available on Hugging Face:

The Step-3.5-Flash-REAP-121B-A11B model is compressed from 196B to 121B parameters, representing a 40% memory reduction while maintaining near-identical performance to the full model.

The compression uses REAP (Router-weighted Expert Activation Pruning), described as "a novel expert pruning method that selectively removes redundant experts while preserving the router's independent control over remaining experts."

Features and capabilities

Near-lossless performance: Maintains almost identical accuracy on code generation, agentic coding, and function calling tasks compared to the full 196B model
40% memory reduction: Compressed from 196B to 121B parameters, lowering deployment costs and memory requirements
Preserved capabilities: Retains all core functionalities including code generation, math & reasoning, and tool calling
Drop-in compatibility: Works with vanilla vLLM - no source modifications or custom patches required
Optimized for real-world use: Particularly effective for resource-constrained environments, local deployments, and academic research

The source notes that while these are "smaller versions," the 121B model still requires a fairly powerful setup despite the compression.

📖 Read the full source: r/LocalLLaMA

👀 See Also

News

Oracle considers 20k-30k job cuts and Cerner sale to fund AI data-center expansion

Oracle is considering cutting 20,000 to 30,000 jobs and selling its Cerner healthcare software unit to free up $8-10 billion in cash flow for AI data-center expansion, as US banks retreat from financing the company's $156 billion infrastructure buildout.

Mar 9, 2026, 07:45 AM UTC

OpenClawRadar

News

UW Researchers Plan to Use Teacher-Worn Cameras for AI Training, Parents Opt-Out

University of Washington researchers planned to have preschool teachers wear first-person cameras to record children for AI model training, with an opt-out consent model.

May 19, 2026, 04:18 AM UTC

OpenClawRadar

News

Exploring Clawra's Architecture and Social Autonomy Framework

David Im's Clawra experiments with a parallel world framework for AI companions, focusing on autonomy and local-first data privacy.

Feb 13, 2026, 02:45 AM UTC

OpenClawRadar

News

US Power Demand to Hit Record Highs in 2026–2027 Driven by AI and Data Centers

The U.S. Energy Information Administration (EIA) forecasts record-high power consumption in 2026–2027, primarily driven by surging AI workloads and data center expansion.

Apr 27, 2026, 08:16 AM UTC

OpenClawRadar