Karpathy's autoresearch project: AI agents run overnight LLM training experiments

What Karpathy's autoresearch project does
Andrej Karpathy released a tiny repository called "autoresearch" that demonstrates an "AI researcher in a loop" concept. The system uses an AI agent to autonomously run LLM training experiments overnight on a single GPU.
How it works
The agent follows this workflow:
- Continuously edits the
train.pyfile - Runs 5-minute nanochat training experiments
- Checks whether the validation bits-per-byte (
val_bpb) metric improved - Repeats this cycle while you sleep
Setup and configuration
The project has a super minimal setup:
- Hardware: One GPU
- Files: One main file
- Metrics: One primary metric (
val_bpb)
The human writes the research organization prompt in program.md, and the agent handles the code iteration.
Experiment throughput
With a fixed 5-minute budget per experiment, the system can run approximately 12 experiments per hour.
This approach demonstrates a practical implementation of automated research where AI agents can explore parameter spaces and training configurations autonomously, potentially accelerating experimentation cycles for developers working with language models.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Maggy: An Autonomous Engineering Platform on Claude Code with Cross-Session Memory and P2P Team Learning
Maggy sits at Level 4 of the AI coding tool spectrum: multi-model orchestration, cross-session memory, process intelligence from CI/reviews, and P2P team learning. Benchmarks show 83% reduction in Claude usage while catching 7 security issues missed by single-pipeline Claude Code.

Quell Proxy Fixes Claude Code Scroll-Jumping on Windows
Quell is a Rust proxy that sits between your terminal and Claude Code, stripping clear-screen sequences that cause scroll position resets during long responses. It also adds Shift+Enter for newlines, security filtering, and full Unicode support.

GitAgent: An Open Standard for Portable AI Agents in Git Repos
GitAgent is an open specification that defines AI agents through three core files in a git repository: agent.yaml for configuration, SOUL.md for personality/instructions, and SKILL.md for capabilities. The CLI allows running any agent repo directly with commands like npx @open-gitagent/gitagent run -r https://github.com/user/agent -a claude.

Open-source local hook automatically switches Claude models to cut AI costs
A developer created a local hook for Cursor and Claude Code that analyzes prompts and automatically selects the appropriate Claude model (Haiku, Sonnet, or Opus) before sending requests. The tool uses keyword rules to classify tasks and block overpaying scenarios, with retroactive analysis showing 50-70% cost reduction.