OmniCoder-9B fine-tune shows strong performance for agentic coding on 8GB VRAM systems

Performance results from testing OmniCoder-9B with OpenCode
A user on r/LocalLLaMA reported testing OmniCoder-9B, a fine-tune of Qwen3.5-9B trained on Opus traces, and found it performed well for agentic coding tasks on systems with limited VRAM. The model is available on Hugging Face at Tesslate/OmniCoder-9B.
Technical setup and configuration
The user ran the Q4_K_M GGUF quantization using ik_llama with the following command:
ik_llama.cpp\build\bin\Release\llama-server.exe -m models/Tesslate/OmniCoder-9B-GGUF/omnicoder-9b-q4_k_m.gguf -ngl 999 -fa 1 -b 2048 -ub 512 -t 8 -c 100000 -ctk f16 -ctv q4_0 --temp 0.4 --top-p 0.95 --top-k 20 --presence-penalty 0.0 --jinja --ctx-checkpoints 0
They achieved approximately 40 tokens per second with this configuration. The user noted that Q5_KS quantization with 64,000 context length provides similar speeds.
OpenCode configuration
The OpenCode configuration used for testing:
"local": { "models": { "/models/Tesslate/OmniCoder-9B-GGUF/omnicoder-9b-q4_k_m.gguf": { "interleaved": { "field": "reasoning_content" }, "limit": { "context": 100000, "output": 32000 }, "name": "omnicoder-9b-q4_k_m", "reasoning": true, "temperature": true, "tool_call": true } }, "npm": "@ai-sdk/openai-compatible", "options": { "baseURL": "http://localhost:8080/v1" } }The user mentioned a potential bug causing full prompt reprocessing that they're investigating.
Context and comparison
The testing was motivated by concerns about quota restrictions and pricing changes in commercial AI coding tools. The user specifically mentioned having 8GB VRAM, which typically limits the ability to run capable open-source models at good speeds for agentic coding. They noted that while MOE models might offer better performance, their speeds are significantly slower.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Kanwas: Open-source shared context board for teams and AI agents
Kanwas is an open-source multiplayer workspace where teams and AI agents share documents, evidence, and decisions on a canvas with live streaming tool calls. Self-hosted via Docker, it's git-backed with Yjs and BlockNote.

Claude Code Verification Bottleneck and Browser Automation Plugin Solution
A developer reports that verification remains the slowest part of using Claude Code, requiring manual testing of features. They found a browser automation plugin that lets the agent verify real product flows before marking tasks complete.

Warp Terminal Goes Open Source with Agentic Dev Environment
Warp is now open-source, rebranding as an agentic development environment with a built-in coding agent and support for bringing your own CLI agents like Claude Code, Codex, and Gemini CLI.

Claude Session Tracker: Auto-Save Claude Code Sessions to GitHub Issues
A new tool called claude-session-tracker automatically saves Claude Code sessions to GitHub Issues, logging every prompt and response as comments with timestamps. It creates one GitHub Issue per session linked to a Projects board and works through Claude Code's native hook system without consuming context tokens.