DeepSeek V4 pricing reality check: 178x cheaper cached tokens vs Opus, but capability lag acknowledged

DeepSeek V4 launched with pricing so low a Reddit user checked the math. Here are the verified numbers:
Pricing breakdown
- V4-Pro standard input: $0.145 per million tokens. Opus 4.7 input: ~$5 per million. Ratio: 34x.
- With 75% promotional discount (through end of May): V4-Pro input drops to $0.036 per million — 138x cheaper than Opus.
- Cache hit pricing: V4-Pro is $0.0036 per million. Opus cached is $0.625 per million. Ratio: 173x.
The catch
As the original post notes, DeepSeek admits V4 is three to six months behind GPT-5.4 and Gemini 3.1 Pro on capability. You're not getting frontier quality at frontier-divided-by-178 — you're getting last summer's frontier quality.
What this means for agentic workflows
For agentic loops with heavy caching (system prompts, tool definitions), the cache hit discount is the real story. Reusable system prompts become essentially free. The key unknown: whether the claimed 1M context window holds up under real workloads or degrades to a usable 200K, as seen with many large-window models.
📖 Read the full source: r/LocalLLaMA
👀 See Also
FairyFuse Achieves 29.6x Kernel Speedup on CPUs via Ternary Weight Multiplication-Free Inference
FairyFuse fuses eight real-valued sub-GEMVs into a single AVX-512 loop using masked adds/subtracts, yielding 32.4 tokens/s on Xeon 8558P and 1.24x speedup over llama.cpp Q4_K_M with near-lossless quality.

DeepSeek-V4-Flash Makes LLM Steering Practical for Local Models
Seen Goedecke explains why steering vectors are relevant again thanks to DeepSeek-V4-Flash running locally via DwarfStar, with hands-on details on how steering works and why it hasn't caught on before.

Claude Code v2.1.157: Auto-Load Plugins from .claude/skills, Improved Agents & Worktrees
Claude Code v2.1.157 automatically loads plugins from .claude/skills, adds claude plugin init scaffolding, honors agent setting in settings.json, and fixes numerous bugs across agents, worktrees, and terminal integration.

Snowflake lays off documentation staff after training AI replacement
Snowflake confirmed 'targeted workforce reductions' in technical writing and documentation teams, with sources reporting approximately 400 people affected. The company had been screen recording documentation sessions for 8 months to build training datasets from senior writers' workflows.