DeepSeek-V4 Pro and Flash: 1.6T Parameters, 1M Token Context, Hybrid Attention

✍️ OpenClawRadar📅 Published: April 24, 2026🔗 Source
DeepSeek-V4 Pro and Flash: 1.6T Parameters, 1M Token Context, Hybrid Attention
Ad

DeepSeek AI has released a preview of the DeepSeek-V4 series on Hugging Face. The lineup includes two Mixture-of-Experts (MoE) language models:

  • DeepSeek-V4-Pro: 1.6 trillion total parameters, 49 billion activated per token
  • DeepSeek-V4-Flash: 284 billion total parameters, 13 billion activated per token

Both models support a context length of one million tokens.

Architectural Upgrades

The V4 series introduces a hybrid attention mechanism combining:

  • Compressed Sparse Attention (CSA)
  • Heavily Compressed Attention (HCA)

At the 1M-token context length, DeepSeek-V4-Pro requires only 27% of the single-token inference FLOPs and 10% of the KV cache compared to DeepSeek-V3.2.

Additionally, the models incorporate Manifold-Constrained Hyper-Connections (mHC) to strengthen residual connections, improving training stability.

Ad

Model Details

  • Repository: deepseek-ai/DeepSeek-V4-Pro on Hugging Face
  • Pipeline tag: text-generation
  • Auto model class: AutoModelForCausalLM
  • License: MIT
  • Weights: sharded safetensors, including BF16, F32, F8_E8M0, F8_E4M3, and INT8 formats
  • Total parameter count from safetensors: ~862 billion parameters (likely total across all experts)

Benchmarks and Efficiency

The technical report (not yet fully public) mentions that the hybrid attention dramatically improves long-context efficiency. In the 1M-token setting, the model achieves a 73% reduction in FLOPs and 90% reduction in KV cache vs V3.2.

For developers running long-context applications (e.g., document analysis, codebase understanding, multi-turn agents), this makes DeepSeek-V4 a compelling choice for beating context-length limits without proportional compute costs.

Who It's For

This release targets developers building AI agents that need to process very long documents, large codebases, or multi-turn conversations with full context retention.

📖 Read the full source: HN AI Agents

Ad

👀 See Also