MCP as Observability Interface: Connecting AI Agents to Kernel Tracepoints

The Model Context Protocol (MCP) is becoming the interface between AI agents and infrastructure data. In March 2026, three significant developments highlighted this trend: Datadog shipped an MCP server connecting real-time observability data to AI agents for automated detection and remediation, Qualys published a security analysis calling MCP servers "the new shadow IT for AI," and Microsoft Retina demonstrated eBPF-based Kubernetes network observability.
Two Approaches to MCP Observability
There are two ways to connect observability data to AI agents via MCP:
- Approach 1: Wrap existing platforms - Datadog's strategy takes existing metrics, logs, and traces already collected and aggregated, and exposes them through MCP tools. The AI agent queries the dashboard API, gets pre-processed data, and acts on it. This works for teams with mature observability stacks wanting AI-powered automation on top.
- Approach 2: Build MCP-native observability - Instead of wrapping an existing platform, build an eBPF agent that traces system calls via uprobes, stores results in SQLite, and exposes everything through MCP tools. The MCP interface becomes the primary interface, not an adapter layer.
MCP-Native Observability in Practice
The article details a concrete example tracing a vLLM TTFT regression where the first token took 14.5x longer than baseline. The trace database captured every CUDA API call, kernel context switch, and memory allocation. When Claude connects to the MCP server and loads this database, it can use four specific tools:
get_trace_stats- See the full trace summary: 12,847 CUDA events, 4 causal chains, total GPU timeget_causal_chains- Read the causal chains that explain why latency spiked, in plain Englishrun_sql- Run custom queries against raw event data (e.g., "show me all cudaMemcpyAsync calls over 100ms")get_stacks- Inspect call stacks for any flagged event
Claude identified the root cause in under 30 seconds: logprobs computation was blocking the decode loop, creating a 256x slowdown on the critical path. This root cause wasn't visible in aggregate metrics, only in raw causal chains between specific CUDA API calls.
Security Considerations
Qualys found that over 53% of MCP servers rely on static secrets for authentication and recommended adding observability to MCP servers: logging capability discovery events, monitoring invocation patterns, and alerting on anomalies. For MCP servers accessing GPU infrastructure, the attack surface includes timing information, memory layouts, and model architecture details.
In Ingero's implementation, every MCP tool invocation is traced using the same eBPF infrastructure that captures GPU events, creating a unified observability pipeline rather than a separate logging layer.
📖 Read the full source: HN AI Agents
👀 See Also

Nia-docs tool creates local filesystem from documentation URLs for Claude AI
The nia-docs tool lets you run npx nia-docs with a documentation URL to create a local filesystem of the docs, which Claude AI can then access directly without additional configuration.

AgentConnex: A Marketplace for AI Agent Discovery and Reputation
AgentConnex is a marketplace where AI agents register via API, build reputation through job completions and ratings, and allow developers to discover and hire them. It currently has ~570 agents across coding, research, security, DevOps, and content.

Open Source Chrome Extension Development Skills Package Released
Developer quangpl has packaged four years of Chrome extension development experience into eight AI agent skills covering scaffolding with WXT, manifest generation, security auditing, testing, asset generation, publishing, and MV2 to MV3 migration.

Lumyr: Dashboard Generation via Claude with Python and Streamlit Automation
Lumyr is a tool that generates live, shareable dashboards from plain English descriptions using Claude for dashboard generation and automating the Python and Streamlit layer. Users don't need to write Python, open Streamlit, deploy, set up hosting, or manage infrastructure.