Local AI Development with Qwen3.6-27B and Opencode on a 5090

A developer who previously dismissed local LLMs as 'not up to standards' compared to cloud offerings like Claude Code or Cursor recently switched to a fully local setup. Using Opencode + llama-server + Qwen3.6-27B at a reasonable quantization with 128K context, running on a single RTX 5090 in a dedicated Linux box. The setup serves over the network to their main dev machine.
Key Details
- Tooling: Opencode (frontend) + llama-server (backend) + Qwen3.6-27B model
- Hardware: 1× RTX 5090, dedicated Linux machine
- Context length: 128K tokens (user unsure if it can be pushed further, but found it sufficient)
- Performance: Not perfect — occasional loops require manual interruption — but overall 'very worthwhile'
Motivation
The switch was driven by increasing usage constraints and 'enshittification' of cloud plans. Local setup eliminates worries about usage limits, prompt analysis, or account bans — particularly important for security research, scraping, or other activities that might trigger cloud provider scrutiny.
Who It's For
Developers on the fence about local AI coding agents, especially those who have been skeptical about local model quality or who need to avoid cloud account risks. If you have a powerful GPU (e.g., RTX 5090), the experience is now competitive with cloud tools.
Bottom Line
The user reports 'immensely freeing' experience despite occasional hiccups, and believes local AI development has reached the point where it's 'very worthwhile indeed.'
📖 Read the full source: r/LocalLLaMA
👀 See Also

Solo developer builds cross-platform desktop AI agent with mobile remote control in 3 weeks, ships to 40+ countries
A solo developer built Skales, a native desktop AI agent with 139+ tools and a mobile companion app for remote control — all in 3 weeks using Claude. The app runs on macOS, Windows, and Linux, is local-first and free, and already has active users in 40+ countries.

Using Claude Code to revive abandoned personal projects: a practical walkthrough
Matthew Brunelle shares how he used Claude Code (with Opus 4.6) to resurrect a stalled YouTube Music–to–OpenSubsonic API shim project, complete with setup steps, prompts, and workflow tips.

Skillware adds synthetic data generator with entropy scoring for local model fine-tuning
Skillware has released a new synthetic data generator skill that uses zlib compression-ratio heuristics to score output diversity, helping prevent model collapse. The tool works out-of-the-box with Ollama, supports Gemini/Anthropic for high-reasoning batches, and outputs JSON batches for .jsonl fine-tuning pipelines.
