ETH Zurich Study Questions Value of AGENTS.md Files for AI Coding Agents

Research Findings on AGENTS.md Files
A new paper from ETH Zurich researchers challenges the widespread industry practice of using AGENTS.md files with AI coding agents. The study, conducted by Thibaud Gloaguen, Niels Mündler, Mark Müller, Veselin Raychev, and Martin Vechev, provides empirical evidence that these context files often hinder rather than help AI agents.
Methodology and Testing
The team built AGENTbench, a novel dataset of 138 real-world Python tasks sourced from niche repositories to avoid bias from popular benchmarks like SWE-bench that AI models may have memorized. They tested four agents: Claude 3.5 Sonnet, Codex GPT-5.2, GPT-5.1 mini, and Qwen Code across three scenarios:
- No context file
- LLM-generated AGENTS.md file
- Human-written AGENTS.md file
Performance was measured using three proxy indicators: task success rates (determined by repository unit tests), number of agent steps, and overall inference costs.
Key Results
LLM-generated context files degraded performance, reducing task success rates by an average of 3% compared to providing no context file. These files consistently increased the number of steps agents took, driving up inference costs by over 20%.
Human-written files showed marginal gains with a 4% average increase in task success rate on AGENTbench, but this came with a parallel increase in steps, raising costs by up to 19%.
Including architectural overviews or repository structure explanations in AGENTS.md files did not reduce the time models spent locating relevant files for tasks.
Behavior Analysis
Trace analysis revealed that agents generally followed instructions in AGENTS.md files, leading them to run more tests, read more files, execute more grep searches, and perform more code-quality checks. While thorough, this behavior was often unnecessary for resolving specific tasks, forcing reasoning models to "think" harder without yielding better final patches.
Practical Recommendations
The researchers recommend omitting LLM-generated context files entirely and limiting human-written instructions to non-inferable details, such as highly specific tooling or custom build commands. They note that while 60,000 open-source repositories currently contain context files like AGENTS.md, and many agent frameworks feature built-in commands to auto-generate them, these files have only marginal effects on agent behavior.
📖 Read the full source: HN AI Agents
👀 See Also

NVIDIA announces NemoClaw with OpenShell security features
NVIDIA announced NemoClaw at GTC, building on OpenClaw to add enterprise-grade security through OpenShell, which enforces policy-based privacy and security guardrails for AI agents.

2026 LLM API Cost Comparison: Self-Hosting vs. Cloud Providers
A Reddit user compared LLM API costs for 1M tokens/day across 11 providers, revealing self-hosting with vLLM costs ~$0.05 per 1M tokens while GPT-4o costs $5/$15 for input/output tokens.

OpenClaw contributor criticizes project's focus on pixel-perfect parity over modern features
A Reddit post from r/openclaw details how a contributor's PR addressing resolution scaling and high-refresh-rate support was rejected for deviating from the original engine's visual constraints, sparking debate about the project's direction.

OpenClaw v2026.3.12 dashboard redesign consolidates interface elements
OpenClaw v2026.3.12 features a complete dashboard redesign that consolidates modular views for chat, config, agents, and sessions, along with command palette, mobile bottom tabs, slash commands, search, export, and pinned messages into a single interface.