Fine-Tuning Qwen 14B for Discord Autocomplete

✍️ OpenClawRadar📅 Published: February 13, 2026🔗 Source
Fine-Tuning Qwen 14B for Discord Autocomplete
Ad

A developer shared their experience on how they fine-tuned the Qwen 14B model to function as an autocomplete tool using their Discord messages. This setup closely resembles tools like GitHub Copilot, where suggestions are made as you type.

The developer used approximately 250 conversations sourced from Discord, obtained through a scraping tool, as their dataset. Each conversation was formatted as chat-ml training samples, particularly focusing on messages where the user said something last, without code blocks or links. This choice indicates a focus on conversational tone rather than technical content.

The Qwen 14B model was fine-tuned using the unsloth.ai platform and QLoRA on a Kaggle GPU, with the entire training process lasting roughly 15 minutes due to the small dataset size. They then merged the fine-tuned model into a .gguf format for local use via ollama.com.

The frontend of this autocomplete tool is implemented as a Chrome extension. It captures the last few messages and the user's ongoing input to build a chat-ml prompt with the appropriate context, which is then used to generate a completion from the Ollama-provided model. A zero-width Unicode character is cleverly used to indicate where the suggestion begins, while pressing shift+tab will accept the suggestion.

Ad

The current setup is operational on Discord, with potential future expansions to support other sites. The developer also suggests experimenting with different model sizes, as the current 14B model nearly maximally uses the available memory. They propose that 4B or 8B models might be viable alternatives, albeit with potential data limitations.

Source code and further details are available on the developer's GitHub at github.com/b44ken/finetune.

📖 Read the full source: r/LocalLLaMA

Ad

👀 See Also