vLLM Hosting on 2x Modded 2080 Ti: Real-World Setup Guide

A Reddit user on r/openclaw describes their setup for local AI hosting using two modded 22GB 2080 Ti GPUs purchased from Alibaba, connected via NVLink, and running vLLM instead of Ollama for tensor parallelism. They are targeting a 20-30B parameter model and ask the community for recommendations suited for light coding work, homelab maintenance, RAG, email triage, and document creation—with heavy coding tasks passed to a Codex OAuth service.

Key details from the post:

Hardware: 2x 22GB (modded) 2080 Ti from Alibaba, likely former mining cards. NVLink bridge interconnects them.
Software: vLLM chosen over Ollama explicitly to leverage tensor parallelism across both GPUs.
Goal: Run a local model in the 20-30B parameter range for OpenClaw, with tasks including light coding, homelab management, RAG, email triage, and document generation.
Users express buyer's remorse and seek validation or practical model suggestions.

The community discussion (linked below) offers firsthand accounts of similar setups, model recommendations (e.g., CodeLlama, DeepSeek Coder, or general-purpose models like Mixtral 8x7B), and tips on memory optimization and prompt engineering for vLLM. Some commenters caution about the modded GPUs' reliability and suggest testing with smaller models first.

📖 Read the full source: r/openclaw

Local vLLM Hosting on 2x Modded 2080 Ti for OpenClaw: Real-World Experience

👀 See Also

Using Claude Code to Build a Japan Travel Blog with AI-Generated Art and Video

Developer Builds Couples Therapy App with Claude, Shares Prompt Engineering Insights

Hybrid RAG for Local Agent Memory with OpenClaw, Ollama, and nomic-embed-text

Claude AI coding assistant requires precise task breakdowns to avoid wasted time