Qwen3.5 35B-A3B MoE runs 27-step agentic workflow locally on mid-range hardware

✍️ OpenClawRadar📅 Published: March 25, 2026🔗 Source
Qwen3.5 35B-A3B MoE runs 27-step agentic workflow locally on mid-range hardware
Ad

Local agentic workflow demonstration

A developer on r/LocalLLaMA reported successfully running a complex agentic workflow locally using Qwen3.5 35B-A3B MoE. The model executed a 27-step video processing chain autonomously on mid-range hardware.

Workflow details

The task involved processing a video from a single natural language prompt:

  • Upload a video
  • Transcribe with Whisper
  • Edit the subtitles
  • Burn subtitles back into video with custom styling

The workflow consisted of 27 sequential tool calls including: extract_audio, transcribe, read_file, edit_file, burn_subtitles, plus verification steps. The model planned, executed, verified each step, and self-corrected when needed.

Ad

Technical specifications

Hardware:

  • Lenovo ThinkPad P53 mobile workstation
  • Intel i7-9850H processor
  • Quadro RTX 3000 (6GB VRAM)
  • 48GB DDR4 2666MT/s RAM

Software stack:

  • Full local implementation with llama.cpp + whisper.cpp
  • No cloud APIs used

Model configuration:

  • Qwen3.5 35B-A3B MoE at Q4_K_M quantization
  • MoE architecture with ~3B active parameters per token
  • Fits and runs on 6GB VRAM with layers offloaded
  • Full 35B parameter knowledge base

Performance results

The complete workflow ran in approximately 10 minutes, with most time spent on inference. The developer noted zero errors and zero human intervention required during the 27-step chain. The MoE architecture made this feasible on mid-range hardware by keeping active parameter count low while maintaining full model capability.

This demonstrates that local agentic workflows are becoming practical on consumer-grade hardware, particularly with MoE models that balance active parameter count for speed against full parameter count for capability.

📖 Read the full source: r/LocalLLaMA

Ad

👀 See Also