SubQ: First Fully Subquadratic LLM with 12M-Token Context and 95% RULER Accuracy

Subquadratic has released SubQ 1M-Preview, the first fully subquadratic large language model, where compute scales linearly with context length — not quadratically as with transformers. This eliminates the need for RAG systems and chunking workarounds for long-context tasks. The research model supports up to 12 million tokens, with a 1M-token production model available in early access.
Key Features
- Subquadratic attention: Reduces attention compute by ~1,000x compared to frontier transformer models at 12M-token context, per the source.
- SubQ Code: CLI-based coding agent that loads entire codebases into a single context window. No multi-agent orchestration needed — plans, executes, and reviews across a full repository in one pass.
- SubQ Search: Long-context search tool offering Deep Research capabilities at chatbot speed.
- API: Full-context API for developers and enterprise teams.
Benchmarks
All results were verified by a third party (source does not specify the firm):
- RULER 128K: 95% accuracy — compared to Claude Opus 4.6 at 94.8%.
- MRCR v2 (multi-piece retrieval & reasoning): Production model scores 65.9; research model scores 83. Reference: Claude Opus 4.7 = 32.2, GPT 5.5 = 74, Gemini 3.1 Pro = 26.3.
- SWE-Bench Verified: 81.8% — compared to Opus 4.6 (80.8) and Deepseek 4.0 Pro (80.0).
- Attention speed: SubQ Sparse Attention is 52× faster than FlashAttention in architecture-level comparison, using 63% less compute.
Architecture Details
The model uses a fundamentally redesigned attention mechanism built from first principles to be subquadratic. It leverages linear attention, state space model ideas, and sparse attention — but unlike prior attempts, maintains frontier-level accuracy. The team includes PhDs from Meta, Google, Oxford, BYU, ByteDance, Adobe, and Cambridge.
Availability
Private beta starts today (May 5, 2026). Access to API, SubQ Code CLI, and SubQ Search. SWE-Bench score indicates strong coding performance for AI coding agents like OpenClawRadar's readers.
📖 Read the full source: HN AI Agents
👀 See Also

AI Is Slowing Down: $3T Revenue Needed by 2030 to Sustain Bubble
Ed Zitron argues AI must generate $3 trillion revenue by 2030. Data centers cost $9.5–15T. Anthropic, OpenAI, NVIDIA projections show massive burn.

Claude Opus 4.6 accuracy drops on BridgeBench hallucination test
Claude Opus 4.6 shows a significant drop in accuracy on the BridgeBench hallucination test, falling from 83% to 68% according to BridgeMind AI's Twitter post.

US Law Enforcement Declares 'Anti-Tech Extremism' a New Threat Category Amid AI Backlash
DHS, FBI, and fusion centers are surveilling 'anti-tech violent extremism' — a novel category targeting protests, data center threats, and AI-related dissent under Trump directives.

Anthropic Urges Global Pause in AI Development, Flags Self-Improvement Risk
Anthropic has called for a global halt on training frontier AI models, citing risks from self-improving systems. The WSJ article details the proposal's scope and rationale.