OpenRouter's Healer Alpha stealth model appears to be unreleased Qwen 3.5-Omni variant

Technical specifications and evidence
Healer Alpha is described as having "vision, hearing, reasoning, and action capabilities" with native perception of visual and audio inputs. The model accepts text, image, audio, and video inputs and outputs text with a maximum output length of 65,536 tokens.
The 262,144 context window is a key identifier - this exact number (2^18) matches Qwen 3.5's native context length precisely, not rounded to 256K. Other models use different lengths: GPT-5.4 uses 272K, Gemini uses 1M, and Claude uses 200K-1M.
Architecture knowledge and capabilities
When asked about Qwen architectures, Healer Alpha produced a 2,000+ word technical explanation covering:
- Qwen3-Omni Thinker-Talker architecture with reasoning/generation split
- Cross-modal fusion and CosyVoice vocoder integration
- GDN (gated normalization mechanism) and MoE expert routing
- 262K context handling using Ring Attention, KV cache optimization, FlashAttention tiling, YaRN/NTK-aware RoPE scaling, and curriculum learning
In contrast, when asked about DeepSeek or xAI architectures, it returned minimal or no responses.
Chinese language proficiency and error metadata
The model demonstrated native-level classical Chinese poetry composition, writing a 七言绝句 about AI with proper tonal structure and classical imagery. It even provided literary analysis of its own poem.
During heavy probing, error responses revealed metadata: {"message": "Provider returned error", "code": 502, "metadata": {"provider_name": "Stealth"}}
Model identification reasoning
The analysis suggests this could be a merged "Qwen 3.5-Omni" variant combining Qwen 3.5's 262K context and hybrid GDN-MoE architecture with Qwen3-Omni's audio/video capabilities. This would represent a new, unreleased model consistent with OpenRouter's pattern of stealth testing unreleased models needing real-world data before launch.
The use of "hearing" instead of "audio" in the description matches Qwen3-Omni's emphasis on end-to-end speech/audio understanding. The model refuses to identify itself in structured self-assessment tests, maintaining its stealth nature.
📖 Read the full source: r/LocalLLaMA
👀 See Also

The AI Ping-Pong: When Every Reply Is a ChatGPT Screenshot
Developers report being flooded with AI-generated answers — from coworkers, bosses, and even GitHub commenters — that ignore context and waste time. The HN discussion captures a growing frustration.

KV Cache Architecture Evolution: From GPT-2 to Mamba
Analysis of KV cache memory costs shows GPT-2 used 300 KiB/token, Llama 3 reduced it to 128 KiB/token with grouped-query attention, and DeepSeek V3 achieved 68.6 KiB/token with multi-head latent attention. Mamba/SSMs eliminate KV cache entirely with fixed-size hidden states.

Google Quietly Buying Play Store Code to Train AI Coding Tools
Google is emailing Android developers offering to pay for their app codebases to train AI coding tools, as part of a confidential pilot program.

Anthropic Delays Claude Code API Rate Limit Changes
Anthropic has rolled back the planned ban on Claude Agent SDK and claude -p from subscription rate limits, originally scheduled for June 15.