How Infracost Cut Claude Token Usage 79% for AI Agents

Infracost, a CLI tool that estimates cloud infrastructure costs from Terraform, CloudFormation, and CDK, has redesigned its output for AI coding agents like Claude Code and Cursor. The result: up to 79% fewer output tokens and 67% lower API costs vs a bare-Claude baseline. The redesign revolves around two techniques: predicate pushdown into the CLI and a token-efficient output format.

Benchmark details

16 questions over a 3-project Terraform fixture with 1,171 resources
Model: Claude Opus, 5 repeats per question
Baseline: bare Claude with Bash and Read tools, no skill loaded
Compared against Infracost skill with --llm output flag

Key results

Metric	Bare Claude	With Infracost skill (--llm)	Change
Correct answers	5 / 11 (45%)	11 / 11 (100%)	+6
Total cost (USD)	$16.41	$9.63	-41%
Output tokens	207,017	81,697	-61%
Wall time	50 min	50 min	tied

One example: the question "count distinct resources failing the tagging policy, deduplicated across projects" cost $3.51 with bare Claude and hit the 25-turn cap, returning no answer. With the redesigned CLI, the same question cost $0.25 and returned the correct answer.

Technical approach

Predicate pushdown: Instead of having the agent pipe JSON through jq or write Python parsers, the CLI accepts filtering flags (e.g., --tag-policy), offloading computation to the tool itself. This reduces the number of turns and token consumption.
Token-efficient output format: The --llm flag returns a compact, agent-friendly format rather than verbose human-readable tables or full JSON. This alone accounts for a significant share of the reduction.

Benchmark harness gotchas

Infracost open-sourced their harness setup to help others avoid pitfalls:

Sandbox HOME for baseline runs to avoid accidental skill loading
Set TMPDIR to a project-local directory to circumvent macOS ACL issues
Prepend the test binary to PATH rather than relying on system install
Use 5+ repeats per cell due to 20-30% token variance
Re-run cells that hit turn caps (--rerun-failed) and re-score if the verifier changes (--rescore)

If you maintain a CLI that AI agents call as a subprocess, the same two moves — predicate pushdown and a dedicated agent output format — likely apply. The redesign also improved the human-facing CLI, though the article focuses on the agent path.

📖 Read the full source: HN AI Agents