LLMs generate SQL queries to analyze terabytes of CI logs in seconds

Mendral's AI agent traced a flaky test to a dependency bump three weeks prior by writing its own SQL queries, scanning hundreds of millions of log lines across a dozen queries, and following a trail from job metadata to raw log output. The whole investigation took seconds.
Every week, about 1.5 billion CI log lines and 700K jobs flow through their system. All of it lands in ClickHouse, compressed at 35:1. All of it is queryable in milliseconds.
SQL interface for the agent
They expose a SQL interface to the agent, scoped to the organization it's investigating. The agent constructs its own queries based on the question. No predefined query library, no rigid tool API. A constrained tool API like get_failure_rate(workflow, days) would limit the agent to anticipated questions. A SQL interface lets it ask questions never thought of, which matters when debugging novel failures.
The agent queries two main targets:
- Job metadata: a materialized view with one row per CI job execution. The agent uses this 63% of the time for questions like "how often does this fail?", "what's the success rate?", "which jobs are slowest?", "when did this start failing?"
- Raw log lines: one row per log line. The agent uses this 37% of the time for questions like "show me the error output for this job", "when did this log pattern first appear?", "how often does this error message occur across runs?"
Query patterns and scale
They analyzed 8,534 agent sessions and 52,312 queries from their observability pipeline. The agent doesn't stop at one query. It investigates - starts broad, then drills in.
Total rows scanned across all queries to answer one question:
- Typical question: 335K rows across about 3 queries
- P75: 5.2 million rows
- P95: 940 million rows
- Heaviest raw-log sessions scan 4.3 billion rows
The agent averages 4.4 queries per session. A typical investigation begins with job metadata (cheap queries, median 47K rows) against a compact, pre-aggregated materialized view. When it finds something interesting, it drills into raw logs (expensive queries, median 1.1M rows).
Data architecture
For the agent to query this fast, the data needs to be structured for it. Up to 300 million log lines flow through on a busy day. They use ClickHouse.
Every log line carries 48 columns of metadata: the full context of the CI run it belongs to. Commit SHA, author, branch, PR title, workflow name, job name, step name, runner info, timestamps, and more.
In ClickHouse's columnar format, denormalizing 48 columns onto every single log line is essentially free. A column like commit_message has the same value for every log line in a CI run, and a single run can produce thousands of log lines. ClickHouse stores those thousands of identical values in sequence. The compression algorithm sees the repetition and compresses it to almost nothing.
Compression ratios:
commit_message: 301:1display_title: 160:1workflow_path: 79:1step_name: 52:1job_name: 48:1
Without denormalization, every query would require a join. With it, they're all simple filters.
📖 Read the full source: HN LLM Tools
👀 See Also

Hermes vs. OpenClaw: The Difference Is Personality, Not Speed
A developer compares Hermes and OpenClaw side by side and finds the key difference is how each handles identity: Hermes stores memories, OpenClaw stores personality facets via soul.md.

Forge agent autonomously fixes GitHub bug using Claude AI
A developer's Forge agent detected a GitHub bug report, triggered a pipeline, used Claude AI to analyze and fix the issue, and opened a PR—all without human intervention while the developer slept.

AI YouTube Creator Reports Monetization Earnings and Workflow Shift
A developer using Claude Opus 4.6 for scripting reported earning $12.20 from 28,400 views on their AI-generated YouTube channel, prompting a shift toward freelance content creation for businesses.

OpenClaw setup evolution: from overconfiguration to practical multi-agent system
A developer shares their journey from three reinstalls to a functional OpenClaw setup with multi-agent specialization, layered memory, and semantic search using QMD backend, running on Mac mini M2 with separate Hetzner instance for experimentation.