Recursive Self-Improvement Framework for AI Coding Agents Using Claude Code

A developer has open-sourced a framework that enables AI coding agents to recursively improve themselves using Claude Code. The system was developed after months of research into how model providers implement recursive agent optimization.
How It Works
The framework provides a structured approach to agent improvement:
- Add tracing to your agent with 2 lines of code (or skip to step 3 if you already have traces)
- Run your agent multiple times to collect execution traces
- Run
/recursive-improvein Claude Code - The system analyzes traces, finds failure patterns, plans fixes, and presents them for approval
- Apply fixes, run agent again, and verify improvement with
/benchmarkagainst baseline - Repeat cycles to continue improvement
Autonomous Option
For fully autonomous operation (similar to Karpathy's autoresearch):
- Run
/ratchetto execute the entire improvement loop automatically - The system improves, evaluates, and keeps or reverts changes
- Only improvements survive
- Can run overnight to wake up to a better agent
Performance Results
Tested on a real-world enterprise agent benchmark (tau2) with the skill running fully on autopilot:
- 25% performance increase after a single improvement cycle
Technical Background
The original research involved building a recursive language model architecture with sandboxed REPL for trace analysis at scale, multi-agent pipelines, and other components. The developer discovered that most people building agents don't need this complexity and that Claude Code provides sufficient capability for recursive self-improvement.
The framework tells your coding agent: here are the traces, here's how to analyze them, here's how to prioritize fixes, and here's how to verify them.
Open-source repository: https://github.com/kayba-ai/recursive-improve
📖 Read the full source: r/ClaudeAI
👀 See Also

Eden AI: European API Hub for AI Models – Pivots as OpenRouter Alternative
Eden AI offers a single unified API to access 500+ AI models (LLMs, vision, OCR, speech) with smart routing, fallback mechanisms, and region control. Positioned as a European alternative to OpenRouter.

MemAware Benchmark Tests AI Memory Beyond Keyword Search
MemAware is a benchmark with 900 questions across 3 difficulty levels that tests whether AI assistants with memory can surface relevant context when queries don't hint at it. Results show BM25 search scored 2.8% vs 0.8% with no memory, while vector search drops to 0.7% on cross-domain connections.

Local voice-to-text transcription for OpenClaw using Parakeet TDT 0.6b v3
A developer has converted NVIDIA's Parakeet TDT 0.6b v3 model to run locally via ONNX on CPU, supporting 25 European languages. The model provides an OpenAI-compatible API endpoint through a Docker container, allowing integration with OpenClaw for audio file transcription.

Tokenmeter: Free Windows App to Track Claude Code Token Usage Offline
Tokenmeter is a free, open-source Windows app that reads local Claude Code .jsonl files to show token usage, estimated costs, cache savings, and a 90-day activity heatmap — all offline.