antirez's DS4: Running DeepSeek V4 Flash with 1M Context on Mac Metal and DGX

✍️ OpenClawRadar📅 Published: May 10, 2026🔗 Source
antirez's DS4: Running DeepSeek V4 Flash with 1M Context on Mac Metal and DGX
Ad

Redis creator Salvatore Sanfilippo (antirez) just released a new project called DS4 on GitHub. The goal: get DeepSeek V4 Flash running with a 1M token context window on Apple Silicon (Metal) hardware. He also posted a video of it running on an NVIDIA DGX system.

What DS4 Does

DS4 leverages novel techniques to fit a 1M context window for DeepSeek V4 Flash on Mac Metal hardware (e.g., M-series chips). It's also been demonstrated on a DGX, suggesting it could work on high-end GPUs like the Pro 6000 at slightly smaller context windows with higher speed. There's speculation about future AMD support.

What's Included

  • Server endpoints: The DS4 server already provides OpenAI and Anthropic-compatible API endpoints, making it easy to plug into agentic coding tools like Cursor, Continue.dev, or custom agents.
  • GitHub repo: https://github.com/antirez/ds4/ — check the README for setup instructions, which likely involve compiling with Metal support and downloading the DeepSeek V4 Flash weights.
  • Video demo: A few hours ago, antirez posted a video on X showing it running on a DGX: https://x.com/antirez/status/2053381973226184749
Ad

Who It's For

Developers with high-end Mac hardware (e.g., Mac Studio, MacBook Pro with M1 Max/Ultra or M2/M3) or NVIDIA GPUs who want to run a powerful local LLM with a very large context window for coding agents or research.

Community Call to Action

The Reddit poster encourages anyone with powerful hardware to check out the project and contribute — whether by testing, reporting bugs, or optimizing for AMD GPUs. The project is early stage, so community involvement could accelerate compatibility.

📖 Read the full source: r/LocalLLaMA

Ad

👀 See Also