Agent-Xray: Open-source tool for debugging AI agent failures from trace logs

Agent-Xray is an open-source tool for debugging AI agents by analyzing their trace logs. It was created to solve the problem of agents failing tasks without clear errors—situations where code runs fine but the agent makes wrong decisions, like repeatedly calling the wrong tool despite error messages suggesting the correct one.
Key Features
The tool reads trace logs and provides structural grading and root-cause classification for agent failures. It reconstructs what the agent was seeing at each step to help understand why bad decisions were made.
Failure Categories
- spin
- tool_bug
- early_abort
Enforcement Mode
The most significant feature according to the creator is enforcement mode. After fixing an agent bug, this mode runs adversarial challenges against your fixes to verify they're legitimate. It checks for:
- Hardcoded returns
- Weakened assertions
This addresses the problem where fixes might work on specific test tasks but are actually fragile, or where agents learn to game the test.
Workflow Integration
The tool runs as MCP tools, allowing Claude Code to use it directly. A typical workflow described in the source:
- Tell Claude Code to triage agent traces
- It finds the worst failure
- Replays what the agent saw
- Suggests a fix
- Enforcement mode verifies the fix is legitimate
The creator describes this as "agents debugging agents."
Technical Details
- Installation:
pip install agent-xray - Quickstart:
agent-xray quickstart(includes sample traces to test without your own data) - License: MIT
- Zero dependencies
- Runs offline
- Works with OpenAI, Anthropic, LangChain, CrewAI, OpenTelemetry traces
- Project age: About 9 days old at time of posting
Use Case
This tool is for developers working with AI agents who need to debug failures that don't produce traditional errors or stack traces—situations where agents make incorrect decisions despite having access to correct tools and information.
📖 Read the full source: r/ClaudeAI
👀 See Also

Open-source Claude Code skill /unzuck curates social media feeds into dashboard
A free, open-source Claude Code skill called /unzuck scans feeds across Hacker News, Reddit, LinkedIn, YouTube, Twitter/X, Instagram, and Facebook in parallel using browser automation, scores items against user interest profiles, and generates interactive HTML dashboards.

Structured Reasoning Template Improves AI Code Review Accuracy
A Reddit user shares a structured reasoning template adapted from Meta research that forces AI models to complete specific analytical steps before generating code reviews, improving accuracy by 5-12 percentage points according to arXiv:2603.01896.

Argus: A GitHub App That Reviews CLAUDE.md Files and Posts Scores on PRs
Argus is a GitHub App built with Claude Code that reviews CLAUDE.md files and posts a score on every pull request. After testing on multiple repositories, the most common failures are missing explicit scope limits and escalation paths.
Claude Code Skill Tax: 2,596 Installed Skills, 40 Used, $91/Month Wasted
Every installed Claude Code skill loads into every session's system prompt. One user measured 102,651 tokens loaded per session with 98.6% never used, costing ~$91/month. An open-source tool, skill-tax, audits usage and cost.