Qwen 3.5 35B Running on 8GB VRAM with llama.cpp Configuration

✍️ OpenClawRadar📅 Published: March 27, 2026🔗 Source

Local Qwen 3.5 35B Setup on Limited VRAM

A developer on r/LocalLLaMA detailed their configuration for running the Qwen 3.5 35B model locally on hardware with 8GB of VRAM. They moved from using Antigravity (with a Google AI Pro plan) to local LLMs after hitting limits with the cloud service.

Hardware and Model Specifications

The setup uses a Lenovo Legion laptop with an i9-14900HX CPU (with E-cores disabled in BIOS, 32GB DDR5 RAM) and an RTX 4060m GPU with 8GB VRAM. The specific model is Qwen 3.5 35B A3B Heretic Opus (Q4_K_M GGUF).

Performance and llama.cpp Configuration

The developer reports getting approximately 700 tokens per second for prompt processing and 42 tokens per second for token generation with this setup. They provided their llama.cpp command-line arguments after testing:

-ngl 99 ^
--n-cpu-moe 40 ^
-c 192000 ^
-t 12 ^
-tb 16 ^
-b 4096 ^
--ubatch-size 2048 ^
--flash-attn on ^
--cache-type-k q8_0 ^
--cache-type-v q8_0 ^
--mlock

Workflow Integration

For their agentic workflow, they found Cline in VSCode to be the closest alternative to Antigravity. They use kat-coder-pro for Plan mode and qwen3.5 for Act mode within this setup. The developer is seeking feedback on whether this local configuration is better than sticking with Google Gemini 3 Flash in Antigravity, noting they prioritize smooth workflow over privacy concerns.

📖 Read the full source: r/LocalLLaMA

👀 See Also

Tools

OpenClaw Smart Router Open-Sourced for Automatic Model Selection

A developer has open-sourced a Smart Router for OpenClaw that automatically classifies queries by complexity and routes them to optimal models, saving 60-80% on API costs compared to always using premium models like Claude or GPT-4o.

Mar 16, 2026, 05:45 PM UTC

OpenClawRadar

Tools

FixAI Dev: A Consumer Rights Game Using Claude Haiku with Strict JSON Contracts

A developer built a browser game where Claude Haiku acts as a corporate AI denying consumer requests; players argue using real consumer protection laws across 37 cases in EU, US, UK, and Australia. The architecture uses Haiku for language only, with server-side game logic and strict JSON contracts between components.

Mar 31, 2026, 07:45 PM UTC

OpenClawRadar

Tools

Agents Room: Desktop App for Visualizing Claude Code Agent Teams

Agents Room is an Electron desktop application that scans for .claude/agents/ folders, reads frontmatter, and visualizes agent relationships on a canvas with automatic connection lines. It allows creating/editing agents, skills, and commands directly in the UI instead of editing markdown files.

Apr 15, 2026, 06:45 PM UTC

OpenClawRadar

Tools

Building a Self-Updating Writing Style Guide for AI-Assisted Content

A team building a voice extraction platform called Noren has developed a 117-line Markdown style guide that rewrites itself after every published piece, using Claude to enforce rules and banning AI-sounding words like 'cadence' and 'optimize'.

Mar 20, 2026, 05:45 AM UTC

OpenClawRadar