Qwen2-0.5B Fine-Tuned for Local Task Automation with llama.cpp

✍️ OpenClawRadar📅 Published: March 22, 2026🔗 Source
Qwen2-0.5B Fine-Tuned for Local Task Automation with llama.cpp
Ad

A developer has fine-tuned Qwen2-0.5B for task automation, creating a model that runs entirely locally on CPU without requiring GPU or cloud APIs. The project, named ACE, is available on GitHub.

What It Does

  • Takes natural language tasks (e.g., "copy logs to backup")
  • Detects task type: atomic, repetitive, or clarification
  • Generates execution plans consisting of CLI commands and hotkeys
  • Runs entirely locally on CPU (no GPU, no cloud APIs)

Technical Details

  • Base model: Qwen2-0.5B
  • Training: LoRA fine-tuning on approximately 1000 custom task examples
  • Quantization: GGUF Q4_K_M format (300MB file size)
  • Inference: llama.cpp
  • Inference time: 3-10 seconds on i3/i5 processors
Ad

Main Challenges During Training

  • Data quality: Had to regenerate dataset 2-3 times due to garbage examples
  • Overfitting: Took multiple iterations to get validation loss stable
  • EOS token handling: Model wouldn't stop generating until tokenizer config was fixed
  • GGUF conversion: Required BF16 dtype + imatrix quantization to get stable outputs

Limitations (v0.1)

  • Requires full file paths (no smart file search yet)
  • CPU inference only (slower on older hardware)
  • Basic execution (no visual understanding)

Performance Benchmarks

  • i5 (2018+) with SSD: 3-5 seconds
  • i3 (2015+) with SSD: 5-10 seconds
  • Older hardware (Pentium + HDD): 30-90 seconds

The developer is seeking feedback on performance across different hardware, edge cases that break the model, and feature requests for v0.2.

📖 Read the full source: r/LocalLLaMA

Ad

👀 See Also