Kelet: Automated Root Cause Analysis for AI Agents

What Kelet Does
Kelet is a service that continuously monitors AI agents and LLM applications in production to automatically identify why they fail. Instead of agents crashing with clear errors, they often give wrong answers quietly, requiring manual trace analysis. Kelet automates this investigation by clustering failure patterns across thousands of sessions.
How It Works
- You connect your traces and signals (user feedback, edits, clicks, sentiment, LLM-as-a-judge, etc.)
- Kelet processes those signals and extracts facts about each session
- It forms hypotheses about what went wrong in each case
- It clusters similar hypotheses across sessions and investigates them together
- It surfaces a root cause with a suggested fix you can review and apply
The key insight: individual session failures look random, but when you cluster the hypotheses, failure patterns emerge.
Integration Options
Three ways to integrate:
- Kelet Skill for coding agents: Scans your codebase, discovers where signals should be collected, and sets everything up automatically
- Python SDK:
pip install kelet - TypeScript SDK:
npm install kelet
Manual setup requires adding two lines to your agent code. Kelet is fully OpenTelemetry-compliant, so any OTEL-instrumented agent works out of the box.
Supported Frameworks and Platforms
Works with: OpenTelemetry, Langfuse, Mixpanel, OpenAI, Anthropic, LangChain, pydantic AI SDK, CrewAI, Strands, Agno, Mastra, PostHog, LangGraph, AutoGen, LlamaIndex, Haystack, Semantic Kernel, and Gemini APIs.
Works with any agent or LLM application where you own the code: agentic loops, multi-step workflows, RAG pipelines, chatbots, autonomous agents.
Two situations where Kelet isn't the right fit:
- If you use AI tools built by others (Cursor, Claude Code, Copilot as a developer)
- If you're building a skill or plugin inside an existing agentic platform
Technical Details
- Runs on Kelet's servers (SOC 2 certified)
- Continuously ingests traces 24/7
- LLM tokens for analysis are covered by Kelet (don't touch your model API bill)
- Pricing based on usage (see kelet.ai/pricing)
- Currently free during beta (no credit card required)
Performance Metrics
From pilot cohort data:
- 73% of teams had failures nobody noticed (Kelet found them)
- 14.3 minutes median time from trace ingestion to prompt patch
- 33K+ sessions analyzed across design partner deployments
📖 Read the full source: HN AI Agents
👀 See Also

OpenClaw developer builds unified memory system for AI agents
A developer has built a 15-tool unified memory system for OpenClaw AI agents that combines structured facts, vector search, entity graphs, episode timelines, hierarchical compression, and event-driven coordination. The system runs locally without cloud dependencies or monthly fees.

Bifrost AI Gateway: Open-Source Tool Addresses AI Infrastructure Gaps
Bifrost is an open-source Go-based LLM gateway that provides automatic failover between providers, budget caps that reject requests, audit logging, and hooks for evaluation. Benchmarks show it's ~50x faster than LiteLLM at high throughput.

Nia-docs tool creates local filesystem from documentation URLs for Claude AI
The nia-docs tool lets you run npx nia-docs with a documentation URL to create a local filesystem of the docs, which Claude AI can then access directly without additional configuration.

Claude Code documentation includes excessive React components inflating token counts
Analysis of Claude Code's LLM documentation reveals that MDX files contain massive inlined React components, with context-window.md using 18,501 tokens but only 551 tokens of actual documentation content.