Mercury 2: Diffusion-Based Model for Real-Time AI Coding

✍️ OpenClawRadar📅 Published: February 25, 2026🔗 Source
Mercury 2: Diffusion-Based Model for Real-Time AI Coding
Ad

What Mercury 2 Is

Mercury 2 is a diffusion-based AI model that generates tokens in parallel rather than sequentially, using a process that refines output over multiple steps. This approach differs from traditional autoregressive models that decode tokens one by one.

Technical Specifications

  • Generation method: Diffusion-based generation instead of sequential token-by-token decoding
  • Processing approach: Generates tokens in parallel and refines them over a few steps
  • Performance: Claims 1,009 tokens/sec on NVIDIA Blackwell GPUs
  • Pricing: $0.25 per 1 million input tokens, $0.75 per 1 million output tokens
  • Context window: 128K tokens
  • Reasoning capability: Tunable reasoning
  • Tool integration: Native tool use with schema-aligned JSON output
  • API compatibility: OpenAI API compatible
Ad

Target Use Cases

The developers are positioning Mercury 2 for:

  • Coding assistants
  • Agentic loops (multi-step inference chains)
  • Real-time voice systems
  • RAG/search pipelines with multi-hop retrieval

📖 Read the full source: r/LocalLLaMA

Ad

👀 See Also