AGENTS.md Best Practices: 25% Correctness Boost or 30% Drop

Augment Code ran a systematic study on AGENTS.md files across their monorepo. The best files gave their coding agent a quality jump equivalent to upgrading from Haiku to Opus; the worst made output worse than having no AGENTS.md at all. The same file boosted best_practices by 25% on a routine bug fix and dropped completeness by 30% on a complex feature task in the same module. Here's what works.

How They Measured

They used AuggieBench, an internal eval suite. They started with high-quality PRs from a large repo that reflect typical day-to-day agent tasks, set up the environment and prompt, and asked the agent to reproduce the PR. They compared output against the golden PR (the version that landed after review by multiple senior engineers). PRs had to be contained within a single module or app, and scope had to be one where an AGENTS.md might plausibly help. Each task ran twice — with and without the file.

What Works

1. Progressive Disclosure > Comprehensive Coverage

Cover common cases and workflows at a high level; push details into reference files the agent can load on demand. Keep each reference's scope clear. Files of 100–150 lines with a handful of focused reference documents delivered 10–15% improvements across metrics in mid-size modules (~100 core files). Beyond that length, gains reversed.

2. Procedural Workflows

A numbered, multi-step workflow can move the agent from failing to finishing. Example: a six-step workflow for deploying a new integration. Missing wiring files dropped from 40% to 10%, agent finished faster, correctness went up 25%, completeness up 20%. Keep the main file concise and use reference files for branching cases.

3. Decision Tables

When two or three reasonable ways exist (e.g., React Query vs Zustand for state management), force the choice up front with a table. Example:

Question → React Query → Zustand
Server is the only data source? ✅
Multiple code paths mutate this state? ✅
Need optimistic updates mixed with local state? ✅

PRs in that area scored 25% higher on best_practices.

4. Short Production Examples

3–10 line snippets from actual production code improved reuse and pattern adherence. Example: copy-paste templates for Redux Toolkit primitives (createSlice with typed initial state, createAsyncThunk with error handling, typed useAppSelector). code_reuse went up 20%.

5. Domain-Specific Rules

Still matter — the pattern most people already associate with AGENTS.md.

📖 Read the full source: HN AI Agents

AGENTS.md Done Right: A 25% Correctness Boost — or a 30% Drop

How They Measured

What Works

1. Progressive Disclosure > Comprehensive Coverage

2. Procedural Workflows

3. Decision Tables

4. Short Production Examples

5. Domain-Specific Rules

👀 See Also

OpenClaw 3.22 Upgrade Checklist: Practical Steps from a Developer Who Got Burned

OpenClaw v2.0 Update: Critical Pre-Update Checklist to Avoid Breaking Changes

Three Essential OpenClaw Skills for a Stable Setup: Memory, Security, and Discovery

OpenClaw setup tips from a user's experience: Gmail MCP, profile flags, and networking issues