Omnicoder-9B Review: Speed vs. Tool Calling Issues

Technical Overview

Omnicoder-9B is a coding-specific model developed by Tesslate, based on the Qwen 3.5 architecture. It's fine-tuned on top of Qwen3.5 9B using outputs from multiple models including Opus 4.6, GPT 5.4, GPT 5.3 Codex, and Gemini 3.1 Pro.

Performance Characteristics

The model demonstrates strong performance on mid-tier hardware. With 12GB of VRAM, users report consistent token generation at 15 tokens/second even with context size set to 100k. Prompt processing is notably fast at approximately 265 tokens/second. The model runs without crashing systems or causing performance degradation.

Limitations and Issues

Despite the speed advantages, Omnicoder-9B shows several limitations in practical coding scenarios:

Failed to generate a complete Super Mario clone in a standalone HTML file with a one-shot prompt
Experienced tool calling failures with MCP servers, generating MCP errors during data fetching
Issues executing write tool calls from Claude Code, though this may involve compatibility factors

IDE Integration Testing

Testing in development environments revealed mixed results:

In LM Studio with Roo Code: Disconnections occurred as token size increased to 4k, though this appears to be an integration issue rather than model-specific
The model successfully updated or wrote small scripts with token sizes between 2-3k
API requests failed for tokens above 4k without error messages
In Claude Code: Token generation felt slower compared to Roo Code, and the model failed to execute write tool calls after generating output

The user notes that Roo Code has been the most effective extension for local LLMs among Continue and other tested options.

📖 Read the full source: r/LocalLLaMA