LLM Circuit Finder: Duplicate 3 Layers to Boost Reasoning

The llm-circuit-finder toolkit implements and extends David Ng's RYS method to discover and exploit 'reasoning circuits' hidden inside transformer models. The core finding: certain contiguous blocks of layers act as indivisible cognitive units. Duplicating them in the forward pass - same weights, no training, no merging - makes models measurably smarter on specific capabilities.

Key Results

Devstral-Small-2-24B with layers 12, 13, 14 duplicated once:

BBH Logical Deduction: 0.22 → 0.76 (+245%)
GSM8K (strict): 0.48 → 0.64 (+33%)
MBPP (code gen): 0.72 → 0.78 (+8%)
Average improvement: +8% across all metrics with nothing degraded

Qwen2.5-Coder-32B with layers 7, 8, 9 duplicated once:

Reasoning probe (causal + logic + nav): 76.5% → 94.1% (+23%)

How It Works

Transformers organize themselves during training into functional circuits - multi-layer processing units that perform complete cognitive operations. These circuits are indivisible: duplicating a single layer does almost nothing, but duplicating the right block of 3-4 layers gives the model a second pass through its reasoning pipeline.

Different models have different circuits in different places:

Devstral-24B (40 layers): reasoning circuit at layers 12-14
Qwen2.5-32B (64 layers): reasoning circuit at layers 7-9

The boundaries are sharp. Shift the block by one layer in either direction and the improvement disappears or inverts.

Different Duplication Patterns Create Different Modes

Same weights on disk, same VRAM for the base model, just different routing:

Double-pass 13-16: Math ↑↑, EQ ↑
Triple-pass 13-16: Math ↑, EQ ↑↑
Interleaved 13,13,14,14,15,15,16: Math ↑↑↑, EQ ↓ (pure math mode)
Quadruple-pass 13-16: Math —, EQ ↑↑ (EQ mode, math neutral)

Quick Start

Find circuits in your model:

pip install gguf requests tqdm
python sweep.py \
  --model /path/to/model.gguf \
  --llama-server /path/to/llama-server \
  --tmpdir /dev/shm/rys \
  --results pass.jsonl \
  --block-sizes 3 4 5 \
  --stride 1 \
  --start-min 10 --start-max 20 \
  --skip-baseline \
  --port 8099 \
  --server-args --device Vulkan1,Vulkan2

Apply a known circuit:

# Duplicate layers 12-14 in Devstral python layer_path.py model.gguf improved.gguf \ -p " 0..14,12,13,14,15..39 " -v Duplicate layers 7-9 in Qwen2.5-32B python layer_path.py model.gguf improved.gguf -p " 0..9,7,8,9,10..63 " -v Triple-pass example

python layer_path.py model.gguf experiment.gguf -p " 0..16,13,14,15,16,13,14,15,16,17..39 " -v

Validate with established benchmarks:

# Start the server with modified model
llama-server -m improved.gguf --port 8089 -ngl 99 --device Vulkan1,Vulkan2
# Run lm-evaluation-harness

The entire discovery process - sweep, discovery, validation - was done on two AMD consumer GPUs (RX 7900 XT + RX 6950 XT) in one evening.

📖 Read the full source: HN LLM Tools

LLM Circuit Finder: Duplicate 3 layers to boost reasoning without training

Key Results

How It Works

Different Duplication Patterns Create Different Modes

Quick Start

Duplicate layers 7-9 in Qwen2.5-32B

Triple-pass example

👀 See Also

YourMemory: AI memory with biological decay hits 59% recall on LoCoMo-10

Membase: External Memory Layer for AI Assistants Across Tools

TRELLIS.2 Image-to-3D Ported to Run Natively on Apple Silicon

MAGELLAN: A 15-Agent Autonomous Scientific Discovery System Built on Claude Code