Infracost cuts Claude token usage 79% by redesigning CLI for AI agents

✍️ OpenClawRadar📅 Published: May 19, 2026🔗 Source
Infracost cuts Claude token usage 79% by redesigning CLI for AI agents
Ad

Infracost, a CLI tool that estimates cloud infrastructure costs from Terraform, CloudFormation, and CDK, has redesigned its output for AI coding agents like Claude Code and Cursor. The result: up to 79% fewer output tokens and 67% lower API costs vs a bare-Claude baseline. The redesign revolves around two techniques: predicate pushdown into the CLI and a token-efficient output format.

Benchmark details

  • 16 questions over a 3-project Terraform fixture with 1,171 resources
  • Model: Claude Opus, 5 repeats per question
  • Baseline: bare Claude with Bash and Read tools, no skill loaded
  • Compared against Infracost skill with --llm output flag

Key results

MetricBare ClaudeWith Infracost skill (--llm)Change
Correct answers5 / 11 (45%)11 / 11 (100%)+6
Total cost (USD)$16.41$9.63-41%
Output tokens207,01781,697-61%
Wall time50 min50 mintied

One example: the question "count distinct resources failing the tagging policy, deduplicated across projects" cost $3.51 with bare Claude and hit the 25-turn cap, returning no answer. With the redesigned CLI, the same question cost $0.25 and returned the correct answer.

Ad

Technical approach

  • Predicate pushdown: Instead of having the agent pipe JSON through jq or write Python parsers, the CLI accepts filtering flags (e.g., --tag-policy), offloading computation to the tool itself. This reduces the number of turns and token consumption.
  • Token-efficient output format: The --llm flag returns a compact, agent-friendly format rather than verbose human-readable tables or full JSON. This alone accounts for a significant share of the reduction.

Benchmark harness gotchas

Infracost open-sourced their harness setup to help others avoid pitfalls:

  • Sandbox HOME for baseline runs to avoid accidental skill loading
  • Set TMPDIR to a project-local directory to circumvent macOS ACL issues
  • Prepend the test binary to PATH rather than relying on system install
  • Use 5+ repeats per cell due to 20-30% token variance
  • Re-run cells that hit turn caps (--rerun-failed) and re-score if the verifier changes (--rescore)

If you maintain a CLI that AI agents call as a subprocess, the same two moves — predicate pushdown and a dedicated agent output format — likely apply. The redesign also improved the human-facing CLI, though the article focuses on the agent path.

📖 Read the full source: HN AI Agents

Ad

👀 See Also

Claude Code Plugin for D&D Campaigns Using Markdown State Tracking
Tools

Claude Code Plugin for D&D Campaigns Using Markdown State Tracking

A Claude Code plugin uses markdown files to track campaign state and lets Claude act as Dungeon Master for solo D&D sessions. The system is free and open-source, requiring installation as a plugin followed by the /claude-dnd:new-campaign command to start.

OpenClawRadar
OpenClaw Alexa Voice Proxy Enables Bidirectional Voice Interaction
Tools

OpenClaw Alexa Voice Proxy Enables Bidirectional Voice Interaction

openclaw-alexa-voice is a Node.js proxy that connects an Alexa Custom Skill to the OpenClaw gateway with a three-tier response system for voice queries. It handles fast responses under 1 second, agent responses under 12 seconds, and deferred complex queries processed asynchronously within 2 minutes.

OpenClawRadar
Temporal-MCP: Wall-Clock Awareness for LLMs with OAuth Support
Tools

Temporal-MCP: Wall-Clock Awareness for LLMs with OAuth Support

Temporal-MCP is a minimal MCP server that provides wall-clock awareness to LLMs, addressing time-related failure modes like incorrect greetings and stale context. It offers two tools (temporal_tick and temporal_peek) returning elapsed time, day-rollover detection, and fresh-thread flags.

OpenClawRadar
Superglue CLI: Let AI Agents Execute API Calls Without Pre-Built Tools
Tools

Superglue CLI: Let AI Agents Execute API Calls Without Pre-Built Tools

Superglue CLI provides a skill that teaches AI coding agents how to use its commands, handle authentication, build tools, and debug failures. Instead of creating pre-built tools for every API integration, agents can read API specs at runtime and plan multi-step calls.

OpenClawRadar