Run Qwen3.6-35B-A3B-UD-Q5_K_XL Locally on AMD R9700 with VS Code Copilot

A Reddit user reports great results running the Qwen3.6-35B-A3B-UD-Q5_K_XL GGUF model locally using llama.cpp with Vulkan on a single AMD R9700 GPU. The setup served as a drop-in replacement for GitHub Copilot in VS Code, generating a complete test website and Playwright test suite with minimal intervention.

llama.cpp Startup Command

/app/llama-server -m /models/Qwen3.6-35B-A3B-UD-Q5_K_XL/Qwen3.6-35B-A3B-UD-Q5_K_XL.gguf \
  --ctx-size 262144 --threads 8 --threads-batch 8 \
  --gpu-layers 99 --parallel 1 --flash-attn on \
  --batch-size 2048 --ubatch-size 1024 \
  --cache-type-k q8_0 --cache-type-v q8_0 \
  --cache-ram 12000 --ctx-checkpoints 50 \
  --mmap --no-mmproj --kv-unified \
  --reasoning off --reasoning-budget 0 --jinja \
  --temp 0.6 --top-k 20 --top-p 0.95 --min-p 0.0 \
  --repeat-penalty 1.0 --presence-penalty 0.0

Key parameters: 256K context window, 99 GPU layers for full offload, flash attention enabled, and sampling config taken from the Qwen3.6-35B-A3B Hugging Face page under "precise coding".

VS Code Integration

The user configured a custom chat model in chatLanguageModels.json pointing to the local llama.cpp server:

{
  "name": "Sean Llama.cpp",
  "vendor": "customoai",
  "apiKey": "${input:chat.lm.secret.3c0c0f21}",
  "models": [
    {
      "id": "Qwen3.6-35B-A3B-UD-Q5_K_XL.gguf",
      "name": "Qwen3.6-35B",
      "url": "https://llm.home.arpa/v1/chat/completions",
      "toolCalling": true,
      "vision": false,
      "maxInputTokens": 180000,
      "maxOutputTokens": 10000,
      "family": "Qwen3",
      "inputTokenCost": 0.0001,
      "outputTokenCost": 0.0001,
      "temperature": 0.6,
      "top_p": 0.95,
      "top_k": 20,
      "repeat_penalty": 1,
      "presence_penalty": 0,
      "frequency_penalty": 0,
      "systemMessage": "You are a precise coding assistant. Avoid repeating plans. Execute tasks directly. Do not restate intentions multiple times.",
      "timeout": 600000,
      "retry": { "enabled": true, "max_attempts": 2, "interval_ms": 1500 }
    }
  ]
}

The model correctly responded to tool calling requests, allowing it to act as a Copilot replacement.

Real-World Test: Full Stack Generation

The user fed a detailed prompt (originally from ChatGPT) asking the model to build a "Bike Shop Service Tracker" — a local-first React + TypeScript app using localStorage. Requirements included a data model, seed data, filtering, sorting, and form validation. The model generated the entire website fully functional on the first run.

Next, they prompted it to generate a complete Playwright test suite. Only one test required a manual fix — otherwise the suite ran without errors. The user's conclusion: "I think I am done tweaking and testing models (until the next big release) and can get back to coding now."