Automating Datadog Alert Triage with Claude Code and MCP

A developer at Quickchat created an automated system to handle morning Datadog alert triage using Claude Code and the Model Context Protocol (MCP). The system eliminates manual checking of Datadog dashboards by having AI agents analyze alerts, classify issues, and open pull requests with fixes.
Setup Components
The implementation involves three main components:
1. Datadog MCP Server Integration
Datadog provides a remote MCP server with OAuth authentication. Configuration requires one file in the repository root:
// .mcp.json
{
"mcpServers": {
"datadog": {
"type": "http",
"url": "https://mcp.datadoghq.eu/api/unstable/mcp-server/mcp"
}
}
}
Developers authenticate with a single browser click. For US1 region users, replace datadoghq.eu with datadoghq.com.
2. Claude Code Skill for Triage
A skill file at .claude/skills/triage-datadog defines the triage workflow in four phases:
- Gather: Check Datadog for monitors, error logs, and incidents from the last 24 hours
- Classify: Sort findings into three categories: Actionable (code bugs), Infrastructure (server problems), and Noise (transient blips)
- Fix: For each real bug, spin up an AI agent in an isolated git worktree to find root causes, write fixes with tests, and open PRs
- Report: Summarize findings in a table format
Agents run in parallel to avoid sequential waiting.
3. Cron Job Automation
The system runs automatically on weekdays at 8 AM with this crontab entry:
3 8 * * 1-5 claude -p --dangerously-skip-permissions '/triage-datadog'
The -p flag prints output without conversation, and --dangerously-skip-permissions allows the agent to proceed without human approval for each file read. Each agent runs in a sandboxed macbox session with scoped git worktrees, no access to production infrastructure, secrets, or deployment pipelines.
For additional security, tools can be restricted with an explicit allowlist:
claude -p --dangerously-skip-permissions --allowedTools "Bash(git:*) Bash(gh:*) Edit Read Grep Glob Agent" '/triage-datadog'
The developer reports the entire setup took about 30 minutes to implement.
📖 Read the full source: HN AI Agents
👀 See Also

Voygr Launches Business Validation API for Fresh Place Intelligence
Voygr's Business Validation API checks if businesses are operating, closed, rebranded, or invalid by aggregating multiple data sources and detecting conflicting signals. The team is building an infinite, queryable place profile that combines accurate place data with fresh web context like news, articles, and events.

User Experience: Switching from OpenClaw to Hermes Agent on Local LLM
A developer reports switching from OpenClaw to Hermes Agent using Qwen3.5-9B on an RX 9070 XT with 16GB VRAM. Hermes completed a complex task with 5 correct tool calls versus OpenClaw's 50+ steps, running 2:30 minutes faster while maintaining RAG, tool calling, and persistent memory functionality.

Open-source methodology for agentic AI partnership with Claude
A developer has published a 25,000-word paper and open-sourced templates for building a persistent partnership system with Claude that uses shared memory across sessions, cognitive monitoring, and multi-AI consultation.

How Claude Helped Reverse-Engineer Garmin’s BLE Protocols to Fake a Native Running Sensor
A developer used Claude to reverse-engineer Garmin’s undocumented BLE protocols, making an ESP32 look like a native HRM strap — dual identity switching and running dynamics RE.