Local Multi-Agent Setup with vLLM, Claude Code, and gpt-oss-120b on Linux

A developer shared their experience creating a fully local, parallel multi-agent coding setup on Linux after switching from Windows. The configuration uses vLLM for parallel inference, Claude Code for agent orchestration, and a large language model for coding tasks.
Setup Components
- vLLM Docker container: Used for easy deployment and parallel inference
- Claude Code: Handles vibecoding and Agent Teams orchestration, configured to point at vLLM localhost endpoint instead of cloud providers
- gpt-oss:120b: Serves as the coding agent
- RTX Pro 6000 Blackwell MaxQ: Primary GPU for the workload
- Dual-boot Ubuntu: Operating system setup
Performance and Workflow Improvements
The developer previously used Ollama and LM Studio but found they processed requests sequentially and experienced slowdowns after multiple message turns and tool calls. With vLLM, they achieved parallel processing that "turbocharged" their experience.
In testing, the setup handled 4 agents collaborating simultaneously as shown in a video demonstration, with the GPU capable of supporting 8 agents in parallel continuously. The only noted issue was throughput reduction, which varies depending on the agent.
Agent Team-scale tasks that previously took hours to complete sequentially can now be done in approximately 30 minutes, depending on project scope. The developer estimates that adding a second MaxQ GPU could potentially scale the system to handle tens of agents concurrently.
This parallel approach enables vibecoding multiple projects locally and concurrently, though it may introduce some increased latency in certain scenarios. The developer found this trade-off preferable to completing projects one agent at a time.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Picar robot car demonstrates autonomous video production with OpenClaw
A PiCar-X robot running OpenClaw with Claude Sonnet on Raspberry Pi 5 autonomously creates YouTube videos by writing scripts from memory logs, generating images with DALL-E 3, narrating with cloned ElevenLabs voice, and assembling with ffmpeg.

Karis CLI Architecture: Using Claude for Planning, Not Execution
Karis CLI uses a three-layer architecture where Claude handles planning and reasoning while pure code executes tasks reliably, creating a stable agent setup that separates LLM capabilities from execution.

Local Multi-Agent AI Setup on WSL Using OpenClaw and Ollama
A developer shares their architecture for running a multi-agent AI system on WSL Ubuntu 24.04 using OpenClaw as a gateway, with four specialized agents including one running locally on Ollama for zero API costs.

Autonomous Magazine Pipeline with Claude Code: Agentic Architecture Breakdown
A seven-step pipeline using Claude Code as an editorial team produces up to five fact-checked, multilingual articles per headline. The system includes five sub-agents, institutional memory via embeddings, and automated fact-checking against a growing database.