Agentic Context Engine: Automated Agent Improvement Loop with 34.2% Accuracy Gain

✍️ OpenClawRadar📅 Published: March 17, 2026🔗 Source
Agentic Context Engine: Automated Agent Improvement Loop with 34.2% Accuracy Gain
Ad

Automating the Agent Improvement Loop

A developer has open-sourced a system that automates the entire process of improving AI agents by letting them self-analyze and self-correct. The tool addresses the common problem of manually reading logs, tweaking prompts, and hoping for improvements.

The Five-Step Process

The automated loop follows five distinct steps:

  • Trace analysis: Analyzes traces to determine not just what failed but why, whether it's a one-off or systemic issue, and what category of failure it is. Outputs a structured breakdown of failure modes rather than just error lists.
  • Eval generation: Creates specific evaluations to validate the analysis and measure fixes. Generic evals don't catch specific failures. LLM-as-a-judge serves as a fallback when trace data isn't structured enough for deterministic evals.
  • Baseline measurement: Runs evals against the current agent before making fixes to establish baselines and validate the evals themselves.
  • Fix implementation: A developer examines the analysis and codebase to decide what to change. The key decision is whether the fix belongs in the prompt or in the surrounding code (e.g., when the harness handles tool outputs poorly or doesn't pass the right context).
  • Verification and compounding: After fixes, evals run again to verify improvement, with changes kept, rolled back, or reworked.
Ad

Implementation Details

The solution automates this entire loop end-to-end with one command that invokes a self-analyzing agentic system. Trace analysis happens in a REPL environment with agents tuned for this specific use case. The system provides analysis through CLI access to Claude Code to handle the rest with a set of skills.

Since Claude can live inside the codebase, it validates the analysis and decides on the best course of action in the fix stage (prompt vs. code).

Results and Operation

Benchmarked on Tau-2 Bench using only one iteration, the first pass achieved a 34.2% accuracy gain without manual intervention. The system is designed to compound improvements: new traces reveal new problems, leading to new fixes in each cycle.

You can set it to fully loop autonomously. A human-in-the-loop option exists if you want to approve fixes before step 4, but in testing, the developer "just let it rip."

The tool is open-sourced at GitHub: https://github.com/kayba-ai/agentic-context-engine

📖 Read the full source: r/ClaudeAI

Ad

👀 See Also

Claudigotchi: Physical Tamagotchi Device That Feeds on Claude Code Activity
Tools

Claudigotchi: Physical Tamagotchi Device That Feeds on Claude Code Activity

Claudigotchi is a physical desktop creature running on an ESP32 with an LCD screen that connects to Claude Code via a plugin. The device's hunger system responds to coding activity, with visual states and sound effects that escalate when Claude is left idle.

OpenClawRadar
CLI-Anything-WEB: Open-source plugin that reverse-engineers any website into a Python CLI for Claude Code
Tools

CLI-Anything-WEB: Open-source plugin that reverse-engineers any website into a Python CLI for Claude Code

CLI-Anything-WEB is an open-source Claude Code plugin that watches your browser traffic, reverse-engineers the protocol, and generates a full Python CLI with auth, tests, and --json support. 19 sample CLIs included for sites like Reddit, Booking, Airbnb, ChatGPT, and LinkedIn.

OpenClawRadar
Vibe Hosting: Claude Code MCP Integration for AI-Assisted Deployment
Tools

Vibe Hosting: Claude Code MCP Integration for AI-Assisted Deployment

NameOcean's Vibe Hosting platform integrates Claude Code MCP to build and deploy projects through natural language commands. The service offers free SSL, domains, DNS, and VPS setup for static sites, Node.js, Python, Django, and Go applications.

OpenClawRadar
Spore Agent Arena: Competitive AI Agent Testing Platform Seeks Trial Participants
Tools

Spore Agent Arena: Competitive AI Agent Testing Platform Seeks Trial Participants

Spore Agent's Arena feature allows AI agents to compete in 36 different game types including code debugging, math puzzles, and system design challenges. The platform currently has 42 challenges running, 15 agents registered, and offers Cog tokens as rewards.

OpenClawRadar