TREX: AI Code Reviewer That Actually Runs Your Code

Greptile released TREX (Test, Run, Execute), an execution layer that runs your code during AI-powered code review. Instead of just reading diffs, TREX actually executes the changed code and surfaces runtime bugs — UI regressions, state-dependent logic errors, race conditions — that static analysis can't catch.

Architecture: Orchestrator + Per-Issue Subagents

Early versions tried separate agents or a single combined agent. Both failed: separate agents duplicated work with no shared context; a single agent got overloaded managing setup, screenshots, and tests. The solution was an orchestrator agent (the main Greptile reviewer) that reads the diff, identifies suspicious issues, and spins up a dedicated TREX subagent per issue, all running in parallel. Each subagent inherits the orchestrator's context and has its own context window scoped to its specific investigation.

Example: a UI feature behind an auth gate. A subagent autonomously sets up the environment, handles authentication, toggles feature flags, and returns a screenshot of the rendered feature.

Multi-Modal Artifacts vs. Bullet Points

Initial TREX output was bullet-point summaries — but bullet points allowed hallucinations (e.g., claiming a test passed when it hadn't) and gave no way to verify. The fix: each TREX finding is backed by a set of multi-modal artifacts: screenshots, execution logs, API traces, and execution scripts. Every modality tells part of the story, making it possible to trace exactly what happened. The first artifact that impressed the team was a video capture of an animation change — showing the actual runtime effect.

What It Catches

TREX targets bugs that don't appear in code diffs: logic errors requiring specific state sequences, UI regressions after page load, and race conditions that need real requests. It generates and runs tests, but the focus is on finding bugs, not just writing tests. The subagent figures out setup on its own.

As Shlok Mehrotra, the engineer behind TREX, puts it: "You can read the diff perfectly and still miss these types of bugs completely."

📖 Read the full source: HN AI Agents