Karpathy's autoresearch project: AI agents run overnight LLM training experiments

✍️ OpenClawRadar📅 Published: March 9, 2026🔗 Source
Karpathy's autoresearch project: AI agents run overnight LLM training experiments
Ad

What Karpathy's autoresearch project does

Andrej Karpathy released a tiny repository called "autoresearch" that demonstrates an "AI researcher in a loop" concept. The system uses an AI agent to autonomously run LLM training experiments overnight on a single GPU.

How it works

The agent follows this workflow:

  • Continuously edits the train.py file
  • Runs 5-minute nanochat training experiments
  • Checks whether the validation bits-per-byte (val_bpb) metric improved
  • Repeats this cycle while you sleep

Setup and configuration

The project has a super minimal setup:

  • Hardware: One GPU
  • Files: One main file
  • Metrics: One primary metric (val_bpb)

The human writes the research organization prompt in program.md, and the agent handles the code iteration.

Ad

Experiment throughput

With a fixed 5-minute budget per experiment, the system can run approximately 12 experiments per hour.

This approach demonstrates a practical implementation of automated research where AI agents can explore parameter spaces and training configurations autonomously, potentially accelerating experimentation cycles for developers working with language models.

📖 Read the full source: r/LocalLLaMA

Ad

👀 See Also

Maggy: An Autonomous Engineering Platform on Claude Code with Cross-Session Memory and P2P Team Learning
Tools

Maggy: An Autonomous Engineering Platform on Claude Code with Cross-Session Memory and P2P Team Learning

Maggy sits at Level 4 of the AI coding tool spectrum: multi-model orchestration, cross-session memory, process intelligence from CI/reviews, and P2P team learning. Benchmarks show 83% reduction in Claude usage while catching 7 security issues missed by single-pipeline Claude Code.

OpenClawRadar
Quell Proxy Fixes Claude Code Scroll-Jumping on Windows
Tools

Quell Proxy Fixes Claude Code Scroll-Jumping on Windows

Quell is a Rust proxy that sits between your terminal and Claude Code, stripping clear-screen sequences that cause scroll position resets during long responses. It also adds Shift+Enter for newlines, security filtering, and full Unicode support.

OpenClawRadar
GitAgent: An Open Standard for Portable AI Agents in Git Repos
Tools

GitAgent: An Open Standard for Portable AI Agents in Git Repos

GitAgent is an open specification that defines AI agents through three core files in a git repository: agent.yaml for configuration, SOUL.md for personality/instructions, and SKILL.md for capabilities. The CLI allows running any agent repo directly with commands like npx @open-gitagent/gitagent run -r https://github.com/user/agent -a claude.

OpenClawRadar
Open-source local hook automatically switches Claude models to cut AI costs
Tools

Open-source local hook automatically switches Claude models to cut AI costs

A developer created a local hook for Cursor and Claude Code that analyzes prompts and automatically selects the appropriate Claude model (Haiku, Sonnet, or Opus) before sending requests. The tool uses keyword rules to classify tasks and block overpaying scenarios, with retroactive analysis showing 50-70% cost reduction.

OpenClawRadar