How to Run 8 Local Agents with vLLM, Claude Code, and gpt-oss-120b

A developer shared their experience creating a fully local, parallel multi-agent coding setup on Linux after switching from Windows. The configuration uses vLLM for parallel inference, Claude Code for agent orchestration, and a large language model for coding tasks.

Setup Components

vLLM Docker container: Used for easy deployment and parallel inference
Claude Code: Handles vibecoding and Agent Teams orchestration, configured to point at vLLM localhost endpoint instead of cloud providers
gpt-oss:120b: Serves as the coding agent
RTX Pro 6000 Blackwell MaxQ: Primary GPU for the workload
Dual-boot Ubuntu: Operating system setup

Performance and Workflow Improvements

The developer previously used Ollama and LM Studio but found they processed requests sequentially and experienced slowdowns after multiple message turns and tool calls. With vLLM, they achieved parallel processing that "turbocharged" their experience.

In testing, the setup handled 4 agents collaborating simultaneously as shown in a video demonstration, with the GPU capable of supporting 8 agents in parallel continuously. The only noted issue was throughput reduction, which varies depending on the agent.

Agent Team-scale tasks that previously took hours to complete sequentially can now be done in approximately 30 minutes, depending on project scope. The developer estimates that adding a second MaxQ GPU could potentially scale the system to handle tens of agents concurrently.

This parallel approach enables vibecoding multiple projects locally and concurrently, though it may introduce some increased latency in certain scenarios. The developer found this trade-off preferable to completing projects one agent at a time.

📖 Read the full source: r/LocalLLaMA

Local Multi-Agent Setup with vLLM, Claude Code, and gpt-oss-120b on Linux

Setup Components

Performance and Workflow Improvements

👀 See Also

Running Multiple AI Coding Agents with OpenClaw: Custom Provider Setup & Cross-Agent Memory Challenges

Automating Business Vetting with OpenClaw: A Case Study

Product Designer Ships macOS Screen Recording App Using Claude Code

Analysis of Anthropomorphism in Claude Pokemon Chat Using Bayesian Models