LLM Circuit Finder: Duplicate 3 layers to boost reasoning without training

The llm-circuit-finder toolkit implements and extends David Ng's RYS method to discover and exploit 'reasoning circuits' hidden inside transformer models. The core finding: certain contiguous blocks of layers act as indivisible cognitive units. Duplicating them in the forward pass - same weights, no training, no merging - makes models measurably smarter on specific capabilities.
Key Results
Devstral-Small-2-24B with layers 12, 13, 14 duplicated once:
- BBH Logical Deduction: 0.22 → 0.76 (+245%)
- GSM8K (strict): 0.48 → 0.64 (+33%)
- MBPP (code gen): 0.72 → 0.78 (+8%)
- Average improvement: +8% across all metrics with nothing degraded
Qwen2.5-Coder-32B with layers 7, 8, 9 duplicated once:
- Reasoning probe (causal + logic + nav): 76.5% → 94.1% (+23%)
How It Works
Transformers organize themselves during training into functional circuits - multi-layer processing units that perform complete cognitive operations. These circuits are indivisible: duplicating a single layer does almost nothing, but duplicating the right block of 3-4 layers gives the model a second pass through its reasoning pipeline.
Different models have different circuits in different places:
- Devstral-24B (40 layers): reasoning circuit at layers 12-14
- Qwen2.5-32B (64 layers): reasoning circuit at layers 7-9
The boundaries are sharp. Shift the block by one layer in either direction and the improvement disappears or inverts.
Different Duplication Patterns Create Different Modes
Same weights on disk, same VRAM for the base model, just different routing:
- Double-pass 13-16: Math ↑↑, EQ ↑
- Triple-pass 13-16: Math ↑, EQ ↑↑
- Interleaved 13,13,14,14,15,15,16: Math ↑↑↑, EQ ↓ (pure math mode)
- Quadruple-pass 13-16: Math —, EQ ↑↑ (EQ mode, math neutral)
Quick Start
Find circuits in your model:
pip install gguf requests tqdm
python sweep.py \
--model /path/to/model.gguf \
--llama-server /path/to/llama-server \
--tmpdir /dev/shm/rys \
--results pass.jsonl \
--block-sizes 3 4 5 \
--stride 1 \
--start-min 10 --start-max 20 \
--skip-baseline \
--port 8099 \
--server-args --device Vulkan1,Vulkan2
Apply a known circuit:
# Duplicate layers 12-14 in Devstral
python layer_path.py model.gguf improved.gguf \
-p " 0..14,12,13,14,15..39 " -v
Duplicate layers 7-9 in Qwen2.5-32B
python layer_path.py model.gguf improved.gguf
-p " 0..9,7,8,9,10..63 " -v
Triple-pass example
python layer_path.py model.gguf experiment.gguf
-p " 0..16,13,14,15,16,13,14,15,16,17..39 " -v
Validate with established benchmarks:
# Start the server with modified model
llama-server -m improved.gguf --port 8089 -ngl 99 --device Vulkan1,Vulkan2
# Run lm-evaluation-harness
The entire discovery process - sweep, discovery, validation - was done on two AMD consumer GPUs (RX 7900 XT + RX 6950 XT) in one evening.
📖 Read the full source: HN LLM Tools
👀 See Also

YourMemory: AI memory with biological decay hits 59% recall on LoCoMo-10
YourMemory gives AI agents persistent memory using Ebbinghaus forgetting curve and graph-enhanced retrieval. Benchmarked at 59% Recall@5 on LoCoMo-10, 2× better than Zep Cloud.

Membase: External Memory Layer for AI Assistants Across Tools
Membase is an external memory layer that extracts and stores conversation context in a knowledge graph, then injects relevant memories into new chats across Claude, ChatGPT, Cursor, Gemini, and other AI tools. It's currently in private beta with all features free.

TRELLIS.2 Image-to-3D Ported to Run Natively on Apple Silicon
A developer has ported Microsoft's 4B parameter TRELLIS.2 image-to-3D model to run natively on Apple Silicon via PyTorch MPS, replacing CUDA-specific operations with pure-PyTorch alternatives. The port generates ~400K vertex meshes from single photos in about 3.5 minutes on M4 Pro with 24GB memory.

MAGELLAN: A 15-Agent Autonomous Scientific Discovery System Built on Claude Code
MAGELLAN is a 15-agent autonomous scientific discovery system built entirely on Claude Code. It uses Opus for deep reasoning and Sonnet for structured tasks, generating cross-disciplinary hypotheses without human direction, with 260 hypotheses proposed and 60% killed by adversarial validation in 19 sessions.