Automate Datadog Alert Triage with Claude Code and MCP

A developer at Quickchat created an automated system to handle morning Datadog alert triage using Claude Code and the Model Context Protocol (MCP). The system eliminates manual checking of Datadog dashboards by having AI agents analyze alerts, classify issues, and open pull requests with fixes.

Setup Components

The implementation involves three main components:

1. Datadog MCP Server Integration

Datadog provides a remote MCP server with OAuth authentication. Configuration requires one file in the repository root:

// .mcp.json
{
  "mcpServers": {
    "datadog": {
      "type": "http",
      "url": "https://mcp.datadoghq.eu/api/unstable/mcp-server/mcp"
    }
  }
}

Developers authenticate with a single browser click. For US1 region users, replace datadoghq.eu with datadoghq.com.

2. Claude Code Skill for Triage

A skill file at .claude/skills/triage-datadog defines the triage workflow in four phases:

Gather: Check Datadog for monitors, error logs, and incidents from the last 24 hours
Classify: Sort findings into three categories: Actionable (code bugs), Infrastructure (server problems), and Noise (transient blips)
Fix: For each real bug, spin up an AI agent in an isolated git worktree to find root causes, write fixes with tests, and open PRs
Report: Summarize findings in a table format

Agents run in parallel to avoid sequential waiting.

3. Cron Job Automation

The system runs automatically on weekdays at 8 AM with this crontab entry:

3 8 * * 1-5 claude -p --dangerously-skip-permissions '/triage-datadog'

The -p flag prints output without conversation, and --dangerously-skip-permissions allows the agent to proceed without human approval for each file read. Each agent runs in a sandboxed macbox session with scoped git worktrees, no access to production infrastructure, secrets, or deployment pipelines.

For additional security, tools can be restricted with an explicit allowlist:

claude -p --dangerously-skip-permissions --allowedTools "Bash(git:*) Bash(gh:*) Edit Read Grep Glob Agent" '/triage-datadog'

The developer reports the entire setup took about 30 minutes to implement.

📖 Read the full source: HN AI Agents

Automating Datadog Alert Triage with Claude Code and MCP

Setup Components

1. Datadog MCP Server Integration

2. Claude Code Skill for Triage

3. Cron Job Automation

👀 See Also

Voygr Launches Business Validation API for Fresh Place Intelligence

User Experience: Switching from OpenClaw to Hermes Agent on Local LLM

Open-source methodology for agentic AI partnership with Claude

How Claude Helped Reverse-Engineer Garmin’s BLE Protocols to Fake a Native Running Sensor