Infracost cuts Claude token usage 79% by redesigning CLI for AI agents

Infracost, a CLI tool that estimates cloud infrastructure costs from Terraform, CloudFormation, and CDK, has redesigned its output for AI coding agents like Claude Code and Cursor. The result: up to 79% fewer output tokens and 67% lower API costs vs a bare-Claude baseline. The redesign revolves around two techniques: predicate pushdown into the CLI and a token-efficient output format.
Benchmark details
- 16 questions over a 3-project Terraform fixture with 1,171 resources
- Model: Claude Opus, 5 repeats per question
- Baseline: bare Claude with Bash and Read tools, no skill loaded
- Compared against Infracost skill with
--llmoutput flag
Key results
| Metric | Bare Claude | With Infracost skill (--llm) | Change |
|---|---|---|---|
| Correct answers | 5 / 11 (45%) | 11 / 11 (100%) | +6 |
| Total cost (USD) | $16.41 | $9.63 | -41% |
| Output tokens | 207,017 | 81,697 | -61% |
| Wall time | 50 min | 50 min | tied |
One example: the question "count distinct resources failing the tagging policy, deduplicated across projects" cost $3.51 with bare Claude and hit the 25-turn cap, returning no answer. With the redesigned CLI, the same question cost $0.25 and returned the correct answer.
Technical approach
- Predicate pushdown: Instead of having the agent pipe JSON through
jqor write Python parsers, the CLI accepts filtering flags (e.g.,--tag-policy), offloading computation to the tool itself. This reduces the number of turns and token consumption. - Token-efficient output format: The
--llmflag returns a compact, agent-friendly format rather than verbose human-readable tables or full JSON. This alone accounts for a significant share of the reduction.
Benchmark harness gotchas
Infracost open-sourced their harness setup to help others avoid pitfalls:
- Sandbox
HOMEfor baseline runs to avoid accidental skill loading - Set
TMPDIRto a project-local directory to circumvent macOS ACL issues - Prepend the test binary to
PATHrather than relying on system install - Use 5+ repeats per cell due to 20-30% token variance
- Re-run cells that hit turn caps (
--rerun-failed) and re-score if the verifier changes (--rescore)
If you maintain a CLI that AI agents call as a subprocess, the same two moves — predicate pushdown and a dedicated agent output format — likely apply. The redesign also improved the human-facing CLI, though the article focuses on the agent path.
📖 Read the full source: HN AI Agents
👀 See Also

Claude Code Plugin for D&D Campaigns Using Markdown State Tracking
A Claude Code plugin uses markdown files to track campaign state and lets Claude act as Dungeon Master for solo D&D sessions. The system is free and open-source, requiring installation as a plugin followed by the /claude-dnd:new-campaign command to start.

OpenClaw Alexa Voice Proxy Enables Bidirectional Voice Interaction
openclaw-alexa-voice is a Node.js proxy that connects an Alexa Custom Skill to the OpenClaw gateway with a three-tier response system for voice queries. It handles fast responses under 1 second, agent responses under 12 seconds, and deferred complex queries processed asynchronously within 2 minutes.

Temporal-MCP: Wall-Clock Awareness for LLMs with OAuth Support
Temporal-MCP is a minimal MCP server that provides wall-clock awareness to LLMs, addressing time-related failure modes like incorrect greetings and stale context. It offers two tools (temporal_tick and temporal_peek) returning elapsed time, day-rollover detection, and fresh-thread flags.

Superglue CLI: Let AI Agents Execute API Calls Without Pre-Built Tools
Superglue CLI provides a skill that teaches AI coding agents how to use its commands, handle authentication, build tools, and debug failures. Instead of creating pre-built tools for every API integration, agents can read API specs at runtime and plan multi-step calls.