How Mendral Cut LLM Costs by Upgrading to Opus: Triager Pattern, SQL Access, and Sub-Agent Architecture

Mendral recently published details on how they upgraded to Opus 4.6 for CI failure analysis while reducing overall LLM costs compared to their previous setup with Sonnet 4.0. The key is an architecture that separates triage from investigation and uses cheap sub-agents for heavy lifting.
Architecture: Cheap triager, expensive planner
Out of ~4,000 CI failures analyzed, 3,187 were duplicates — a known flaky test, infrastructure hiccup, or network blip. Waking up an expensive model for those is wasteful. But deduplication isn't deterministic: the same job can fail for different reasons. Their solution is a triager pattern:
- A Haiku agent handles the narrow job: decide if a failure is already tracked. It uses exact matching and semantic search (pgvector) against known error messages. Two different strings like
operator does not exist bigint character varyingandmigration type mismatch on installation_idare the same root cause — semantic search catches that. - When in doubt, Haiku escalates to Opus 4.6. A false positive costs a little; a false negative misses a real bug.
- 4 out of 5 failures never reach Opus. A triager match costs ~25x less than a full investigation.
Let agents pull context, don't push it
Instead of stuffing 200K+ line logs into prompts, agents get a SQL interface to ClickHouse. There's a raw table (github_logs, one row per log line) and materialized views with pre-aggregated data: failure rates by workflow, job timings, outcome counts. Most investigations start with the views to narrow down, then drill into raw logs. If a query returns too many rows, the system truncates and suggests a more specific view. If logs aren't ingested yet, agents fall back to the GitHub CLI.
Expensive models plan, cheap models execute
Opus forms a hypothesis and spawns Haiku sub-agents capped at one level deep — no unbounded fan-out. Each sub-agent gets a prompt from Opus: exactly what to search and how. Example from a real case:
Three Storybook CI jobs failed on the same commit, crashing at pnpm install. Opus dispatched a sub-agent to fetch error messages from that step. ClickHouse didn't have the logs yet, so the sub-agent used GitHub CLI and returned: gyp ERR! not found: make — [email protected] couldn't compile because make wasn't on the runner. Opus then queried ClickHouse for the failure trend over 14 days, found the inflection point, and escalated. Sub-agent prompts are explicit: "Fetch the CI logs for this run. Return the exact error messages from the pnpm install step, the full error output, especially the last 50-100 lines."
Who this is for
Teams building LLM-powered agents for CI debugging or any task where context size and cost are concerns.
📖 Read the full source: HN LLM Tools
👀 See Also

Ollama Update Adds OpenClaw Support for Kimi k2.5 Cloud Model
Ollama has released an update that integrates OpenClaw support for cloud models, including free access to the Kimi k2.5 model with web search functionality, running on NVIDIA data centers.

Clavis MCP Server: Secure Credential Management for Claude Desktop
Clavis is an MCP server that manages API keys and OAuth tokens for Claude Desktop, storing credentials with AES-256 encryption and providing automatic token refresh to prevent mid-conversation expiration errors.

cq: A Local-First Knowledge Sharing System for AI Coding Agents
Mozilla.ai's cq is an open-source tool that lets AI coding agents share 'knowledge units' about common gotchas via a local SQLite store, with optional team sharing through a Docker API. It installs as a Claude Code plugin or OpenCode MCP server.

Multi-operator Claude Code: Hub-based architecture for multi-agent sessions
A hub-based setup for Claude Code enables multiple people to attach to the same session, route subtasks across repos, and run headless agents in Docker containers.