DeepSeek V4 Flash Near-Opus Quality for Local LLMs On Premises

A developer on r/openclaw reports that DeepSeek 4 Flash is achieving near-Opus level performance for local LLM use cases, specifically for on-premise AI agents handling confidential customer data. The user states they have been extremely disappointed with every model not named Opus until now.

Key Details

Use case: On-premise local LLMs + AI agents for customers who refuse to use cloud services like AWS due to data confidentiality concerns.
Model performance: DeepSeek 4 Flash is described as "near-Opus level", meaning it's the first viable option outside of Claude Opus for this specific workload.
Hardware: The user is investing in a $25,000 computer (likely a multi-GPU workstation) to run the model locally. They note that even with NVIDIA GPUs, processing 1M tokens can be frustratingly slow.
Comparison: They express skepticism about Qwen 35B users, claiming it can't even match Sonnet for the job, and question whether Mac users are actually running local LLMs or just claiming to—citing unbearable slowness on Apple hardware.
Attribution: The user acknowledges the model comes from China (DeepSeek is a Chinese AI lab) and wonders what they get out of it, but is grateful for the free, locally-runnable LLM.