ATLAS: Open-Source Test-Time Compute Pipeline for Qwen3-14B Achieves Frontier-Level Coding Performance

ATLAS is an open-source test-time compute pipeline built around Qwen3-14B that achieves coding performance comparable to frontier models at significantly lower cost. The project was developed by a business management student at Virginia Tech who learned to code while building it.
Development Evolution
The developer spent two to three months researching hundreds of papers to connect existing research that hadn't been combined before. The system evolved through three major versions:
- V1: Basic infrastructure, described as "VERY rudimentary (essentially just RAG)"
- V2: Applied energy-based verification inspired by Anthropic's "When Models Manipulate Manifolds" paper, resulting in a decent verifier
- V3: Doubled performance over V1 baseline after extensive research including exploration of the Halting Problem
Performance Benchmarks
Results on 599 LiveCodeBench v5 problems:
- DeepSeek V3.2 Reasoning: 86.2% pass@1, ~$0.002 per task (API)
- GPT-5 (high): 84.6% pass@1, ~$0.043 per task (API)
- ATLAS V3: 74.6% pass@1, ~$0.004 per task (electricity)
- Claude 4.5 Sonnet: 71.4% pass@1, ~$0.066 per task (API)
Technical Details and Limitations
The system is "slow as hell" according to the developer. Easy tasks take seconds, but complex coding problems can take up to an hour. V3.1 is moving to Qwen 3.5 9B for improved speed and parallelization.
ATLAS includes full MaaS (Model-as-a-Service) infrastructure that allows connecting OpenCode or Claude Code via API. The developer recommends at least 16GB VRAM, warning that with less memory it will be "even slower than I mentioned."
Setup and Reproducibility
The project is fully open source with no plans for commercialization. The repository is available at https://github.com/itigges22/ATLAS. The developer notes that reproducibility needs work, but suggests that "if you ask Claude Code to optimize it for your setup it should work fine."
📖 Read the full source: r/LocalLLaMA
👀 See Also

blend-ai: New Blender MCP Service for Claude Code
blend-ai is a new Blender MCP service that allows Claude Code to generate 3D scenes. A user reported it worked faster and better than blender-mcp, creating a shuttle launch scene from reference images in 5 minutes.

Local PII Redaction Skill for OpenClaw Uses GLiNER Model
A new OpenClaw skill intercepts outgoing responses and runs them through the local nvidia/gliner-PII model to detect and redact sensitive information like API keys and PII, replacing them with labels like [API_KEY] and adding removal notices.

AGENTS.md Schema for LLM-Compiled Knowledge Bases with Learning Layer
AGENTS.md v1.0 provides a schema standard for Claude to build and maintain personal research wikis from raw sources, including a spaced repetition learning layer with automatic flashcard generation and knowledge gap tracking.

MCP-Loci: Local Persistent Memory Server for Claude and MCP-Compatible AI
MCP-Loci is a persistent memory server that solves Claude's session-based memory limitation with five tools: remember, recall, forget, synthesize, and health. It uses hybrid BM25 keyword matching and semantic embeddings for accurate recall without requiring API keys.