LLM Architecture Gallery: Visual Reference for Llama 3, DeepSeek V3, Gemma 3

Sebastian Raschka's LLM Architecture Gallery is a collection of architecture figures and fact sheets from The Big LLM Architecture Comparison and A Dream of Spring for Open-Weight LLMs, focusing specifically on architecture panels. The gallery includes clickable figures that enlarge for detail, with model titles linking to corresponding article sections.

Key Model Details

The gallery provides specific architectural specifications for numerous models:

Llama 3 8B: 8B parameters, released 2024-04-18, dense decoder with GQA and RoPE attention, serves as pre-norm baseline
OLMo 2 7B: 7B parameters, released 2024-11-25, dense decoder with MHA and QK-Norm, uses inside-residual post-norm instead of pre-norm
DeepSeek V3: 671B total parameters (37B active), released 2024-12-26, sparse MoE decoder with MLA attention, uses dense prefix plus shared expert
DeepSeek R1: 671B total parameters (37B active), released 2025-01-20, sparse MoE decoder with MLA attention, architecture matches DeepSeek V3 with reasoning-oriented training
Gemma 3 27B: 27B parameters, released 2025-03-11, dense decoder with GQA and QK-Norm, uses 5:1 sliding-window/global attention ratio
Mistral Small 3.1 24B: 24B parameters, released 2025-03-18, dense decoder with standard GQA, latency-focused design with smaller KV cache
Llama 4 Maverick: 400B total parameters (17B active), released 2025-04-05, sparse MoE decoder with GQA attention, alternates dense and MoE blocks
Qwen3 235B-A22B: 235B total parameters (22B active), released 2025-04-28, sparse MoE decoder with GQA and QK-Norm, optimized for serving efficiency without shared expert
Qwen3 32B: 32B parameters, released 2025-04-28, dense decoder with GQA and QK-Norm, reference dense Qwen stack with 8 KV heads
Qwen3 4B: 4B parameters, released 2025-04-28, dense decoder with GQA and QK-Norm, compact stack with 151k vocabulary
Qwen3 8B: 8B parameters, released 2025-04-28, dense decoder with GQA and QK-Norm, reference Qwen3 dense stack with 8 KV heads
SmolLM3 3B: 3B parameters, released 2025-06-19, dense decoder with GQA, experiments with periodic NoPE layers

Practical Features

The gallery includes an issue tracker for reporting inaccurate fact sheets, mislabeled architectures, or broken links. A physical poster version is available via Zazzle with a high-resolution export at 14570 x 12490 pixels (56 MB PNG file, 182 megapixels).

For developers working with AI coding agents, this resource provides concrete architectural details that can inform model selection, fine-tuning decisions, and performance optimization. The side-by-side comparison format makes it easier to understand trade-offs between different architectural choices.

📖 Read the full source: HN LLM Tools

LLM Architecture Gallery: Visual Reference for Model Designs

Key Model Details

Practical Features

👀 See Also

Building a Coding Agent for 8k Context: Planner/Executor Split, Token Budgeting, and Parallel Execution

mistral.rs Adds Support for Gemma 4 12B: Multimodal, Agentic, and MTP

Dual-model architecture reduces token consumption by half for long conversations

Building a Programming Language with Claude Code: The Cutlet Experiment