TREX: Greptile's AI Code Reviewer That Runs Your Code

Greptile released TREX (Test, Run, Execute), an execution layer that runs your code during AI-powered code review. Instead of just reading diffs, TREX actually executes the changed code and surfaces runtime bugs — UI regressions, state-dependent logic errors, race conditions — that static analysis can't catch.
Architecture: Orchestrator + Per-Issue Subagents
Early versions tried separate agents or a single combined agent. Both failed: separate agents duplicated work with no shared context; a single agent got overloaded managing setup, screenshots, and tests. The solution was an orchestrator agent (the main Greptile reviewer) that reads the diff, identifies suspicious issues, and spins up a dedicated TREX subagent per issue, all running in parallel. Each subagent inherits the orchestrator's context and has its own context window scoped to its specific investigation.
Example: a UI feature behind an auth gate. A subagent autonomously sets up the environment, handles authentication, toggles feature flags, and returns a screenshot of the rendered feature.
Multi-Modal Artifacts vs. Bullet Points
Initial TREX output was bullet-point summaries — but bullet points allowed hallucinations (e.g., claiming a test passed when it hadn't) and gave no way to verify. The fix: each TREX finding is backed by a set of multi-modal artifacts: screenshots, execution logs, API traces, and execution scripts. Every modality tells part of the story, making it possible to trace exactly what happened. The first artifact that impressed the team was a video capture of an animation change — showing the actual runtime effect.
What It Catches
TREX targets bugs that don't appear in code diffs: logic errors requiring specific state sequences, UI regressions after page load, and race conditions that need real requests. It generates and runs tests, but the focus is on finding bugs, not just writing tests. The subagent figures out setup on its own.
As Shlok Mehrotra, the engineer behind TREX, puts it: "You can read the diff perfectly and still miss these types of bugs completely."
📖 Read the full source: HN AI Agents
👀 See Also

Open Source Book Genesis: 20 Claude Code Skills for Autonomous Book Writing
Book Genesis is an open-source system of 20 specialized Claude Code skills that takes a book idea and produces a complete, publish-ready manuscript through a 14-phase autonomous pipeline. It includes a 'Chaos Engine' to break AI predictability patterns and has generated a 68,000-word memoir scoring 9.0/10 on its Genesis Score.

Atlas Inference Engine Goes Open Source: Pure Rust + CUDA, 100+ tok/s on DGX Spark
Atlas is now open source — a Rust + CUDA inference engine that achieves 130 tok/s peak on Qwen3.5-35B (NVFP4) on a single DGX Spark, with no Python runtime and <2 minute cold start.

Memex: Open-Source Memory Plugin for Claude Cowork
Memex is an open-source plugin that gives Claude Cowork persistent memory across sessions using a tiered context loading system. After running /memex:init once, Claude briefs itself in about 20 seconds per session and picks up where you left off.

Claude wrote 3,000 lines of code instead of importing pywikibot — a case study in AI agents ignoring existing libraries
A developer tasked Claude Code (Opus 4.7) with fixing typos on Fandom wikis. The model wrote ~3,000 lines of Python reimplementing pywikibot, mwparserfromhell, and RETF rules rather than importing them. The post explores why this happens and how a two-minute search reduced the codebase to 1,259 lines.