antirez's DS4: Running DeepSeek V4 Flash with 1M Context on Mac Metal and DGX

Redis creator Salvatore Sanfilippo (antirez) just released a new project called DS4 on GitHub. The goal: get DeepSeek V4 Flash running with a 1M token context window on Apple Silicon (Metal) hardware. He also posted a video of it running on an NVIDIA DGX system.
What DS4 Does
DS4 leverages novel techniques to fit a 1M context window for DeepSeek V4 Flash on Mac Metal hardware (e.g., M-series chips). It's also been demonstrated on a DGX, suggesting it could work on high-end GPUs like the Pro 6000 at slightly smaller context windows with higher speed. There's speculation about future AMD support.
What's Included
- Server endpoints: The DS4 server already provides OpenAI and Anthropic-compatible API endpoints, making it easy to plug into agentic coding tools like Cursor, Continue.dev, or custom agents.
- GitHub repo: https://github.com/antirez/ds4/ — check the README for setup instructions, which likely involve compiling with Metal support and downloading the DeepSeek V4 Flash weights.
- Video demo: A few hours ago, antirez posted a video on X showing it running on a DGX: https://x.com/antirez/status/2053381973226184749
Who It's For
Developers with high-end Mac hardware (e.g., Mac Studio, MacBook Pro with M1 Max/Ultra or M2/M3) or NVIDIA GPUs who want to run a powerful local LLM with a very large context window for coding agents or research.
Community Call to Action
The Reddit poster encourages anyone with powerful hardware to check out the project and contribute — whether by testing, reporting bugs, or optimizing for AMD GPUs. The project is early stage, so community involvement could accelerate compatibility.
📖 Read the full source: r/LocalLLaMA
👀 See Also

OmniCoder-9B: 9B Parameter Coding Agent Fine-Tuned on 425K Agentic Trajectories
Tesslate released OmniCoder-9B, a 9-billion parameter coding agent model fine-tuned on Qwen3.5-9B's hybrid architecture. It was trained on 425,000+ curated agentic coding trajectories from Claude Opus 4.6, GPT-5.4, GPT-5.3-Codex, and Gemini 3.1 Pro.

re_gent: Git for AI Coding Agents – Version Control for Agent Activity
re_gent is an open-source tool that provides version control for AI agent sessions, tracking every tool call, storing prompts and file diffs, and enabling commands like `rgt log`, `rgt blame`, and `rgt rewind` (coming soon).

bad-ass-mcp: Free, Open Source MCP for Native Desktop GUI Control via Accessibility API
bad-ass-mcp is an open source MCP server that lets Claude and other AI agents control macOS, Windows, and Linux desktops using the native accessibility layer — no screenshots, no look-move-look loops. Free alternative to Computer Use, Operator, or UiPath.

Testing AI Agents Against Real-world APIs with d3 Labs
d3 labs offers 10 free production APIs to help developers test AI agents in real-world scenarios instead of relying on unrealistic mocks.