SubQ: A Sub-Quadratic LLM with 12M-Token Context Window

SubQ from Subquadratic is a production-ready LLM built on a fully sub-quadratic sparse-attention architecture. It handles up to 12M tokens in a single prompt, runs at 150 tokens per second, and costs roughly 1/5 of leading models like GPT-5 or Opus.
Architecture & Benchmarks
Unlike standard transformers with O(n²) attention, SubQ uses a sub-quadratic sparse-attention mechanism that only processes relevant token relationships. At 12M tokens, this reduces attention compute by nearly 1000×. Benchmarks (third-party validated):
- SWE-Bench Verified (real-world coding): 81.8%
- RULER @ 128K (long-context accuracy): 95.0%
- MRCR v2 (8-needle, 1M): 65.9%
For comparison, SubQ's SWE-Bench score sits between Gemini 3.1 Pro (80.6%) and Opus 4.6 (80.8%). The model also outperforms Opus 4.7 (87.6%? – not reported at time) and GPT-5.5 (n/r) on MRCR v2.
Products & Integration
Two access options:
- Full-Context API: 12M-token context, streaming, tool use, OpenAI-compatible endpoints. Process entire repositories in one call at linear cost.
- SubQ Code (long-context layer for coding agents): Plug into Claude Code, Codex, or Cursor. ~25% lower bill, 10× faster exploration, auto-redirects expensive model turns. One-line install.
Who It's For
Developers and teams running AI agents that need to reason across full codebases, long PR histories, or persistent state without quality loss.
📖 Read the full source: HN AI Agents
👀 See Also

context-os: Open-source tool reduces Claude Code token consumption by 27-42%
context-os is a local context optimizer that hooks into Claude Code automatically, compressing tool output before Claude sees it and reducing token consumption by 27-42% depending on content type.

Open-source Claude Code reimplementation patched for local model compatibility
A developer patched the open-source Claude Code reimplementation to work with Ollama and local models by removing hardcoded Anthropic client dependencies. The CLI now auto-detects providers from model names and environment variables.

LobsterBoard adds theme system and template gallery
LobsterBoard now includes a theme system with five visual options and a template gallery that allows users to export and import dashboard layouts with automatic sensitive data stripping.

Comparing Local vs. Cloud AI Agents: OpenClaw and Twin.so
OpenClaw is an open-source local AI agent that runs on your machine with full data control, while Twin.so is a cloud-based platform with 200,000+ community-built agents for 24/7 automation.