$6.4k Local LLM Server: TCO vs API Costs

A developer on r/LocalLLaMA posted a thorough cost analysis of their $6,406.45 local LLM server, including depreciation and electricity, comparing it to API pricing. The server uses four used AMD MI100 32GB GPUs with llama.cpp running Qwen3.6 27B, processing 20.4M input tokens and 1.32M output tokens per day.

Hardware Specs

4x Used MI100 32GB: $4,234.82
ASRock EPYCD8-2T motherboard: $721.61
1600W 80+ Platinum PSU: $497.95
8x8GB DDR4 ECC RDIMMs (used): $348.79
EPYC 7K62 48-core CPU (used): $254.28
CPU cooler, case, blowers, cables: ~$349
Total: $6,406.45

Performance & Cost Comparison

At $0.29/M input and $3.2/M output on OpenRouter for Qwen3.6 27B, the API equivalent daily cost is $10.14, or $3,701.10/year. The local server produces the same tokens at a daily electricity cost of $2.11 (630W at $0.14/kWh), or $770.15/year.

Depreciation Accounting

The author uses a realistic depreciation model: accessories 100% loss, new parts 50% loss, used parts 10% loss. This yields a one-time hardware depreciation cost of $1,442.57, which is roughly the same whether sold after 1 day or 5 years.

After one year, total local cost = $770 (electricity) + $1,443 (depreciation) = $2,213, compared to $3,701 for API — a savings of $1,488.

Coding Plan Comparison

For context, Z.AI's top coding plan ($144/month) provides about 4.5M input/200k output tokens/day of GLM 4.7, which normalized to the same capacity as the local server would cost $652.80/month or $7,833.60/year — more than double OpenRouter pricing for the same model.

The author notes that coding plans aren't always good value, and advises checking what you're actually paying for in tokens.

📖 Read the full source: r/LocalLLaMA