Cerebras releases Step-3.5-Flash-REAP models with 40% memory reduction

What this is
Cerebras has released Step-3.5-Flash-REAP models, which are memory-efficient compressed variants of their larger models. These are smaller versions designed for what the source calls "potato setups," though the 121B parameter model still requires significant resources.
Key details from the source
The models are available on Hugging Face:
The Step-3.5-Flash-REAP-121B-A11B model is compressed from 196B to 121B parameters, representing a 40% memory reduction while maintaining near-identical performance to the full model.
The compression uses REAP (Router-weighted Expert Activation Pruning), described as "a novel expert pruning method that selectively removes redundant experts while preserving the router's independent control over remaining experts."
Features and capabilities
- Near-lossless performance: Maintains almost identical accuracy on code generation, agentic coding, and function calling tasks compared to the full 196B model
- 40% memory reduction: Compressed from 196B to 121B parameters, lowering deployment costs and memory requirements
- Preserved capabilities: Retains all core functionalities including code generation, math & reasoning, and tool calling
- Drop-in compatibility: Works with vanilla vLLM - no source modifications or custom patches required
- Optimized for real-world use: Particularly effective for resource-constrained environments, local deployments, and academic research
The source notes that while these are "smaller versions," the 121B model still requires a fairly powerful setup despite the compression.
📖 Read the full source: r/LocalLLaMA
👀 See Also

AI Subscription Pricing Crash: Why Your Enterprise Bill Is About to 10x
AI labs like OpenAI, Anthropic, and Microsoft are losing money on every subscription seat. Agentic workloads have broken the flat-fee model — GitHub Copilot moves to usage-based billing June 1, 2026. Enterprises that built on subsidized pricing face a correction.

Meta tracking employee computer interactions for AI agent training
Meta is installing tracking software on US employee computers to capture mouse movements, clicks, and keystrokes for training AI models that can perform work tasks autonomously. The tool runs on work-related apps and websites and takes occasional screen snapshots for context.

PostmarketOS February 2026 Update: Generic Kernels and AI Policy
PostmarketOS now offers generic kernel packages (linux-postmarketos-mainline, -stable, -lts) and has updated its AI policy to explicitly forbid generative AI. The project also saw contributor changes and hardware CI improvements.

Berkeley Study: All AI Revision Prompts Drift Prose Toward Formality, Even "Preserve Voice"
New paper from Berkeley measures 300 personal narratives through Claude, ChatGPT, and Gemini under three prompt conditions. Every model and condition reduces contractions, first-person pronouns, and narrative closeness — the "preserve voice" prompt only reduces drift magnitude, not direction.