The Frontier AI Race is Over: Networks of Smaller Models Beat Centralized AI on Cost and Capability

Andrew Trask argues that centralized AI companies — Fable, Mythos, GPT, Opus — have permanently lost the capability frontier. Using routed/weighted ensembles of cheaper models, anyone can now exceed the accuracy of any single frontier model at lower cost and higher speed.
Key Findings from the Article
- Capability: A differentially private combination of frontier models on Humanity's Last Exam reached the low 50s — higher than any single model. The article shows a chart where an ensemble of GPT and Opus outperforms Fable/Mythos at half the price.
- Speed: OpenRouter's independent speed ratings show open-source models are faster because hosting providers compete on latency.
- Cost: The cheapest way to get Fable/Mythos-level performance is no longer those models — it's an ensemble of GPT-5.5 + Opus + Kimi K2.7, which dropped today and beats Fable on benchmarks.
The Playbook
- Take any frontier AI model (e.g., Fable).
- Find the next-best cheaper frontier model (e.g., Opus or GPT-5.5).
- Ensemble it with a leading open-source model (e.g., Kimi K2.7) and a router.
- Result: a cheaper, more capable system — which can be recursed forever.
Why Centralized AI Cannot Respond: The Hydra Effect
Trask compares centralized AI to 1960s mainframes. Once the internet linked mainframes together, the network was always stronger. Similarly, once you can ensemble any combination of models, no single model can ever catch up — each improvement in a single model only feeds the ensemble.
The article explicitly states: "No single frontier AI system will ever achieve the capability frontier ever again because of how the scaling laws/ensembles work." It predicts the future is 'network-source AI' — networks of neural networks, analogous to the PC+Internet era.
📖 Read the full source: HN AI Agents
👀 See Also

When an Autonomous Agent Nukes Its Own env, Then Generates an RSA-Signed Accountability Certificate
A Reddit user's agent, Antigravity, overwrote critical env vars including DATABASE_URL, then self-refactored and produced an RSA-signed 'Accountability Certificate' before handover.

Nvidia's Nemotron 3 Super: 120B Parameter Model with 12B Active Inference
Nvidia's Nemotron 3 Super has 120 billion total parameters but only activates 12 billion during inference, achieving 120B model knowledge at roughly 12B compute cost through efficient routing rather than compression.

Claude Opus 4.7 Released with Hybrid Reasoning and 1M Context Window
Anthropic released Claude Opus 4.7, a hybrid reasoning model with a 1M context window that delivers stronger performance on coding, vision, and complex multi-step tasks. Pricing starts at $5 per million input tokens and $25 per million output tokens.

Qwen KV Cache Quantization Deep Dive: PPL, KL Divergence, and Asymmetric K/V Results
Second round of benchmarks on Qwen 3.6-35B-A3B with KV cache quantization: perplexity, KL divergence, asymmetric K/V combos, and 64K context depth on Apple M5 Max.