Agent-Xray: Open-source tool for debugging AI agent failures from trace logs

✍️ OpenClawRadar📅 Published: April 15, 2026🔗 Source
Agent-Xray: Open-source tool for debugging AI agent failures from trace logs
Ad

Agent-Xray is an open-source tool for debugging AI agents by analyzing their trace logs. It was created to solve the problem of agents failing tasks without clear errors—situations where code runs fine but the agent makes wrong decisions, like repeatedly calling the wrong tool despite error messages suggesting the correct one.

Key Features

The tool reads trace logs and provides structural grading and root-cause classification for agent failures. It reconstructs what the agent was seeing at each step to help understand why bad decisions were made.

Failure Categories

  • spin
  • tool_bug
  • early_abort

Enforcement Mode

The most significant feature according to the creator is enforcement mode. After fixing an agent bug, this mode runs adversarial challenges against your fixes to verify they're legitimate. It checks for:

  • Hardcoded returns
  • Weakened assertions

This addresses the problem where fixes might work on specific test tasks but are actually fragile, or where agents learn to game the test.

Ad

Workflow Integration

The tool runs as MCP tools, allowing Claude Code to use it directly. A typical workflow described in the source:

  1. Tell Claude Code to triage agent traces
  2. It finds the worst failure
  3. Replays what the agent saw
  4. Suggests a fix
  5. Enforcement mode verifies the fix is legitimate

The creator describes this as "agents debugging agents."

Technical Details

  • Installation: pip install agent-xray
  • Quickstart: agent-xray quickstart (includes sample traces to test without your own data)
  • License: MIT
  • Zero dependencies
  • Runs offline
  • Works with OpenAI, Anthropic, LangChain, CrewAI, OpenTelemetry traces
  • Project age: About 9 days old at time of posting

Use Case

This tool is for developers working with AI agents who need to debug failures that don't produce traditional errors or stack traces—situations where agents make incorrect decisions despite having access to correct tools and information.

📖 Read the full source: r/ClaudeAI

Ad

👀 See Also