Nanocode: Train Claude-like Coding Agents with JAX on TPUs

Nanocode is a library that demonstrates how to train your own Claude Code model end-to-end using Constitutional AI, following Anthropic's approach. Written entirely in JAX and optimized for TPUs, it adapts infrastructure from Karpathy's nanochat project.

Training Setup and Costs

The nanocode-d24 model (1.3B parameters) can be reproduced in approximately 9 hours on a TPU v6e-8 at a cost of $200. The smaller nanocode-d20 model (477M parameters) trains in about 1.5 hours for $34. The project recommends using Google's TRC program for free access to pre-emptible TPUs for a month, or Google Cloud's $300 credits for new accounts.

Technical Implementation

The training process includes:

Writing a SOUL.md file to define model alignment
Defining an agentic interface for world interaction
Generating synthetic data
Using preference optimization to align the model with SOUL

Tokenization and Pre-training Differences

While the pre-training and tokenizer training process is similar to nanochat, nanocode includes additional coding data from The Stack-V2 at a 1:5 ratio in both pre-training and tokenizer mixtures. This results in stronger coding performance but reduces general text tokenization efficiency.

Tokenizer comparison shows nanocode achieves -50.9% better tokenization for code compared to nanochat, while nanochat performs better on Korean text (+7.9% for nanocode on news, -27.6% on Korean).

Commands and Setup

export NANOCODE_BASE_DIR="$HOME/.cache/nanocode"
export MODEL_TAG=d24
python -m data.pretrain -d fineweb-edu -n 300
python -m data.pretrain -d the-stack-v2-dedup -n 60
python -m scripts.tok_train --max-chars=2000000000
python -m scripts.tok_eval

The models are trained with a param:data ratio of 8, following nanochat's scaling law analysis. While optimized for TPUs, nanocode should also work on NVIDIA GPUs out of the box.

📖 Read the full source: HN AI Agents