Local Multi-Agent Setup with vLLM, Claude Code, and gpt-oss-120b on Linux

✍️ OpenClawRadar📅 Published: March 26, 2026🔗 Source
Local Multi-Agent Setup with vLLM, Claude Code, and gpt-oss-120b on Linux
Ad

A developer shared their experience creating a fully local, parallel multi-agent coding setup on Linux after switching from Windows. The configuration uses vLLM for parallel inference, Claude Code for agent orchestration, and a large language model for coding tasks.

Setup Components

  • vLLM Docker container: Used for easy deployment and parallel inference
  • Claude Code: Handles vibecoding and Agent Teams orchestration, configured to point at vLLM localhost endpoint instead of cloud providers
  • gpt-oss:120b: Serves as the coding agent
  • RTX Pro 6000 Blackwell MaxQ: Primary GPU for the workload
  • Dual-boot Ubuntu: Operating system setup
Ad

Performance and Workflow Improvements

The developer previously used Ollama and LM Studio but found they processed requests sequentially and experienced slowdowns after multiple message turns and tool calls. With vLLM, they achieved parallel processing that "turbocharged" their experience.

In testing, the setup handled 4 agents collaborating simultaneously as shown in a video demonstration, with the GPU capable of supporting 8 agents in parallel continuously. The only noted issue was throughput reduction, which varies depending on the agent.

Agent Team-scale tasks that previously took hours to complete sequentially can now be done in approximately 30 minutes, depending on project scope. The developer estimates that adding a second MaxQ GPU could potentially scale the system to handle tens of agents concurrently.

This parallel approach enables vibecoding multiple projects locally and concurrently, though it may introduce some increased latency in certain scenarios. The developer found this trade-off preferable to completing projects one agent at a time.

📖 Read the full source: r/LocalLLaMA

Ad

👀 See Also