SubQ: First Fully Subquadratic LLM with 12M-Token Context and 95% RULER Accuracy

Subquadratic has released SubQ 1M-Preview, the first fully subquadratic large language model, where compute scales linearly with context length — not quadratically as with transformers. This eliminates the need for RAG systems and chunking workarounds for long-context tasks. The research model supports up to 12 million tokens, with a 1M-token production model available in early access.
Key Features
- Subquadratic attention: Reduces attention compute by ~1,000x compared to frontier transformer models at 12M-token context, per the source.
- SubQ Code: CLI-based coding agent that loads entire codebases into a single context window. No multi-agent orchestration needed — plans, executes, and reviews across a full repository in one pass.
- SubQ Search: Long-context search tool offering Deep Research capabilities at chatbot speed.
- API: Full-context API for developers and enterprise teams.
Benchmarks
All results were verified by a third party (source does not specify the firm):
- RULER 128K: 95% accuracy — compared to Claude Opus 4.6 at 94.8%.
- MRCR v2 (multi-piece retrieval & reasoning): Production model scores 65.9; research model scores 83. Reference: Claude Opus 4.7 = 32.2, GPT 5.5 = 74, Gemini 3.1 Pro = 26.3.
- SWE-Bench Verified: 81.8% — compared to Opus 4.6 (80.8) and Deepseek 4.0 Pro (80.0).
- Attention speed: SubQ Sparse Attention is 52× faster than FlashAttention in architecture-level comparison, using 63% less compute.
Architecture Details
The model uses a fundamentally redesigned attention mechanism built from first principles to be subquadratic. It leverages linear attention, state space model ideas, and sparse attention — but unlike prior attempts, maintains frontier-level accuracy. The team includes PhDs from Meta, Google, Oxford, BYU, ByteDance, Adobe, and Cambridge.
Availability
Private beta starts today (May 5, 2026). Access to API, SubQ Code CLI, and SubQ Search. SWE-Bench score indicates strong coding performance for AI coding agents like OpenClawRadar's readers.
📖 Read the full source: HN AI Agents
👀 See Also

MCP vs Skills Debate: Understanding the Roles and the Real Problem of Context Rot
A Reddit post clarifies that MCP provides tools, authentication, and context steering for AI agents, while Skills are reusable prompts that define agent behavior. The author argues both are needed and identifies context rot as a critical issue where agents forget instructions.

Reddit Discussion Critiques Reactive AI Assistants, Calls for True Proactivity
A Reddit post argues that current AI assistants are reactive by design, waiting for human prompts rather than proactively identifying issues. The author distinguishes between scheduled checks and true contextual awareness, noting that real proactivity requires persistent memory, event-driven triggers, and cross-time reasoning.

Claude Code 2.1.76 adds MCP elicitation, worktree improvements, and fixes for context limits
Claude Code version 2.1.76 introduces MCP elicitation support for structured input during tasks, adds worktree.sparsePaths for large monorepos, and fixes 'Context limit reached' errors on 1M-context sessions. Version 2.1.75 made 1M context windows default for Opus 4.6 on Max, Team, and Enterprise plans.

Claude AI Shows Repetition Bug with 'Sketcher' Term in QGIS Workflow
A user reported Claude AI repeatedly outputting the word 'sketcher' when providing QGIS guidance for aligning DXF files, suggesting a potential model bug with specific terms. The source includes practical QGIS workflow details for coordinate system alignment.