NVIDIA DGX Spark Community Launches Spark Arena for Reproducible LLM Benchmarks

The NVIDIA DGX Spark community has established Spark Arena, a reproducible benchmarking platform for open-weights large language models on DGX Spark hardware, addressing previous issues with inconsistent reporting.
Background and Problem
NVIDIA began shipping DGX Spark in mid-October 2025 as a desktop box with unified memory capable of running large models locally, including ~200B parameter models for inference. The community identified a recurring problem where "everyone posts partial flags, then nobody can reproduce it two weeks later."
Standardized Methodology
On October 14, 2025, u/ggerganov posted a DGX Spark performance thread in llama.cpp with a clear methodology: measuring prefill (pp) and generation/decode (tg) across multiple context depths and batch sizes, using llama.cpp CUDA builds with llama-bench and llama-batched-bench.
Community Solution
The community agreed on standardized tools for runtime image building, orchestration, and recipe format, launching Spark Arena on February 11, 2026.
Current Performance Leaders
Top decode tokens/sec results from Spark Arena:
- gpt-oss-120b (vLLM, MXFP4, 2 nodes): 75.96 tok/s
- Qwen3-Coder-Next (SGLang, FP8, 2 nodes): 60.51 tok/s
- gpt-oss-120b (vLLM, MXFP4, single node): 58.82 tok/s
- NVIDIA-Nemotron-3-Nano-30B-A3B (vLLM, NVFP4, single node): 56.11 tok/s
Practical Implications
This standardized approach provides developers with reliable performance data for selecting and configuring open-weights LLMs on DGX Spark hardware, enabling better-informed decisions about model deployment and optimization.
📖 Read the full source: r/clawdbot
👀 See Also

Claude Pro Subscription Bug: Paid Users Stuck on Free Plan
A bug in Claude Pro after using a gift pass leaves accounts stuck on Free despite successful payment and receipts. Anthropic support unresponsive for a week.

Meta acquires Moltbook, a Reddit-style forum for AI agents
Meta has acquired Moltbook, a Reddit-style forum platform designed specifically for AI agents. The acquisition was confirmed on Tuesday, with Moltbook's creators joining Meta's Superintelligence Labs.

A 50-Dev Low-Code Shop Vaporized in 12 Months: The Dependency Trap of AI Coding Agents
A 50-person low-code shop lost all clients in 12 months because "low-code + AI" beats pure low-code and full-stack. Meanwhile, a solo developer dependent on Claude Max faces session caps and rising costs. Both illustrate the same dilemma: adapt or depend.

Google's TimesFM 2.5: 200M-parameter time-series model with 16k context
Google Research released TimesFM 2.5, a 200M-parameter decoder-only foundation model for time-series forecasting with 16k context length and continuous quantile forecasting up to 1k horizon.