DeepSeek-V4 Pro and Flash: 1.6T Parameters, 1M Token Context, Hybrid Attention

DeepSeek AI has released a preview of the DeepSeek-V4 series on Hugging Face. The lineup includes two Mixture-of-Experts (MoE) language models:
- DeepSeek-V4-Pro: 1.6 trillion total parameters, 49 billion activated per token
- DeepSeek-V4-Flash: 284 billion total parameters, 13 billion activated per token
Both models support a context length of one million tokens.
Architectural Upgrades
The V4 series introduces a hybrid attention mechanism combining:
- Compressed Sparse Attention (CSA)
- Heavily Compressed Attention (HCA)
At the 1M-token context length, DeepSeek-V4-Pro requires only 27% of the single-token inference FLOPs and 10% of the KV cache compared to DeepSeek-V3.2.
Additionally, the models incorporate Manifold-Constrained Hyper-Connections (mHC) to strengthen residual connections, improving training stability.
Model Details
- Repository:
deepseek-ai/DeepSeek-V4-Proon Hugging Face - Pipeline tag:
text-generation - Auto model class:
AutoModelForCausalLM - License: MIT
- Weights: sharded safetensors, including BF16, F32, F8_E8M0, F8_E4M3, and INT8 formats
- Total parameter count from safetensors: ~862 billion parameters (likely total across all experts)
Benchmarks and Efficiency
The technical report (not yet fully public) mentions that the hybrid attention dramatically improves long-context efficiency. In the 1M-token setting, the model achieves a 73% reduction in FLOPs and 90% reduction in KV cache vs V3.2.
For developers running long-context applications (e.g., document analysis, codebase understanding, multi-turn agents), this makes DeepSeek-V4 a compelling choice for beating context-length limits without proportional compute costs.
Who It's For
This release targets developers building AI agents that need to process very long documents, large codebases, or multi-turn conversations with full context retention.
📖 Read the full source: HN AI Agents
👀 See Also

Exploring n8n as an Alternative to OpenClaw Skills for Automation
The OpenClaw community on Reddit debates the pros and cons of using n8n over OpenClaw Skills for automation tasks. Key discussion points include ease of use, flexibility, and real-world application examples.

David Silver's Ineffable Intelligence Raises $1.1B for RL-Based Superlearner Without Human Data
Ineffable Intelligence, founded by DeepMind alum David Silver, raised $1.1B at a $5.1B valuation to build a reinforcement learning-based 'superlearner' that discovers knowledge without human data.

AI-generated frontends converge on emerald green design patterns
AI-generated frontend components have shifted from the earlier purple gradient era to a new uniformity centered on emerald green accents, buttons, and hover states. This convergence appears linked to AI skills and Tailwind component prompts that associate emerald with quality UI design.

Tennessee Woman Jailed for Six Months Due to AI Facial Recognition Error
Angela Lipps, a 50-year-old Tennessee grandmother, spent nearly six months in jail after Fargo police used facial recognition software to incorrectly identify her as a suspect in a North Dakota bank fraud case. She was released on Christmas Eve after bank records proved she was 1,200 miles away at the time of the crimes.