Building a $6.4k Local LLM Server: TCO Breakdown vs API Costs

A developer on r/LocalLLaMA posted a thorough cost analysis of their $6,406.45 local LLM server, including depreciation and electricity, comparing it to API pricing. The server uses four used AMD MI100 32GB GPUs with llama.cpp running Qwen3.6 27B, processing 20.4M input tokens and 1.32M output tokens per day.
Hardware Specs
- 4x Used MI100 32GB: $4,234.82
- ASRock EPYCD8-2T motherboard: $721.61
- 1600W 80+ Platinum PSU: $497.95
- 8x8GB DDR4 ECC RDIMMs (used): $348.79
- EPYC 7K62 48-core CPU (used): $254.28
- CPU cooler, case, blowers, cables: ~$349
- Total: $6,406.45
Performance & Cost Comparison
At $0.29/M input and $3.2/M output on OpenRouter for Qwen3.6 27B, the API equivalent daily cost is $10.14, or $3,701.10/year. The local server produces the same tokens at a daily electricity cost of $2.11 (630W at $0.14/kWh), or $770.15/year.
Depreciation Accounting
The author uses a realistic depreciation model: accessories 100% loss, new parts 50% loss, used parts 10% loss. This yields a one-time hardware depreciation cost of $1,442.57, which is roughly the same whether sold after 1 day or 5 years.
After one year, total local cost = $770 (electricity) + $1,443 (depreciation) = $2,213, compared to $3,701 for API — a savings of $1,488.
Coding Plan Comparison
For context, Z.AI's top coding plan ($144/month) provides about 4.5M input/200k output tokens/day of GLM 4.7, which normalized to the same capacity as the local server would cost $652.80/month or $7,833.60/year — more than double OpenRouter pricing for the same model.
The author notes that coding plans aren't always good value, and advises checking what you're actually paying for in tokens.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Claude Code + MCP generates test suites from source code
Claude Code analyzes source code to generate hierarchical test suites covering modules, features, scenarios, happy paths, edge cases, and error handling, then pushes them to test management systems via MCP.

Reduce AI Coding Session Costs by 90% with Graph-Based Code Indexing
A developer built a local graph database that indexes a codebase using LLM-generated summaries, cutting Claude Code session costs from $6-10 to cents by avoiding redundant file re-reads.

Phaselock: An AI Agent Control System Inspired by Parenting Techniques
Phaselock is an open-source Agent Skill that implements four control mechanisms for AI agents: explicit gates before action, immediate feedback on mistakes, constrained choices, and mechanical rule enforcement. It works with Claude Code, Cursor, Windsurf, and tools supporting hooks.

OpenClaw Setup Assistance Offered by ClawSet
ClawSet provides setup services for OpenClaw, focusing on understanding client needs. The service includes a setup call for $99 and a month of troubleshooting support.