Cerebras releases Step-3.5-Flash-REAP models with 40% memory reduction

✍️ OpenClawRadar📅 Published: February 25, 2026🔗 Source
Cerebras releases Step-3.5-Flash-REAP models with 40% memory reduction
Ad

What this is

Cerebras has released Step-3.5-Flash-REAP models, which are memory-efficient compressed variants of their larger models. These are smaller versions designed for what the source calls "potato setups," though the 121B parameter model still requires significant resources.

Key details from the source

The models are available on Hugging Face:

The Step-3.5-Flash-REAP-121B-A11B model is compressed from 196B to 121B parameters, representing a 40% memory reduction while maintaining near-identical performance to the full model.

The compression uses REAP (Router-weighted Expert Activation Pruning), described as "a novel expert pruning method that selectively removes redundant experts while preserving the router's independent control over remaining experts."

Ad

Features and capabilities

  • Near-lossless performance: Maintains almost identical accuracy on code generation, agentic coding, and function calling tasks compared to the full 196B model
  • 40% memory reduction: Compressed from 196B to 121B parameters, lowering deployment costs and memory requirements
  • Preserved capabilities: Retains all core functionalities including code generation, math & reasoning, and tool calling
  • Drop-in compatibility: Works with vanilla vLLM - no source modifications or custom patches required
  • Optimized for real-world use: Particularly effective for resource-constrained environments, local deployments, and academic research

The source notes that while these are "smaller versions," the 121B model still requires a fairly powerful setup despite the compression.

📖 Read the full source: r/LocalLLaMA

Ad

👀 See Also

AI Subscription Pricing Crash: Why Your Enterprise Bill Is About to 10x
News

AI Subscription Pricing Crash: Why Your Enterprise Bill Is About to 10x

AI labs like OpenAI, Anthropic, and Microsoft are losing money on every subscription seat. Agentic workloads have broken the flat-fee model — GitHub Copilot moves to usage-based billing June 1, 2026. Enterprises that built on subsidized pricing face a correction.

OpenClawRadar
Meta tracking employee computer interactions for AI agent training
News

Meta tracking employee computer interactions for AI agent training

Meta is installing tracking software on US employee computers to capture mouse movements, clicks, and keystrokes for training AI models that can perform work tasks autonomously. The tool runs on work-related apps and websites and takes occasional screen snapshots for context.

OpenClawRadar
PostmarketOS February 2026 Update: Generic Kernels and AI Policy
News

PostmarketOS February 2026 Update: Generic Kernels and AI Policy

PostmarketOS now offers generic kernel packages (linux-postmarketos-mainline, -stable, -lts) and has updated its AI policy to explicitly forbid generative AI. The project also saw contributor changes and hardware CI improvements.

OpenClawRadar
Berkeley Study: All AI Revision Prompts Drift Prose Toward Formality, Even "Preserve Voice"
News

Berkeley Study: All AI Revision Prompts Drift Prose Toward Formality, Even "Preserve Voice"

New paper from Berkeley measures 300 personal narratives through Claude, ChatGPT, and Gemini under three prompt conditions. Every model and condition reduces contractions, first-person pronouns, and narrative closeness — the "preserve voice" prompt only reduces drift magnitude, not direction.

OpenClawRadar