LLM Architecture Gallery: Visual Reference for Model Designs

✍️ OpenClawRadar📅 Published: March 16, 2026🔗 Source
LLM Architecture Gallery: Visual Reference for Model Designs
Ad

Sebastian Raschka's LLM Architecture Gallery is a collection of architecture figures and fact sheets from The Big LLM Architecture Comparison and A Dream of Spring for Open-Weight LLMs, focusing specifically on architecture panels. The gallery includes clickable figures that enlarge for detail, with model titles linking to corresponding article sections.

Key Model Details

The gallery provides specific architectural specifications for numerous models:

  • Llama 3 8B: 8B parameters, released 2024-04-18, dense decoder with GQA and RoPE attention, serves as pre-norm baseline
  • OLMo 2 7B: 7B parameters, released 2024-11-25, dense decoder with MHA and QK-Norm, uses inside-residual post-norm instead of pre-norm
  • DeepSeek V3: 671B total parameters (37B active), released 2024-12-26, sparse MoE decoder with MLA attention, uses dense prefix plus shared expert
  • DeepSeek R1: 671B total parameters (37B active), released 2025-01-20, sparse MoE decoder with MLA attention, architecture matches DeepSeek V3 with reasoning-oriented training
  • Gemma 3 27B: 27B parameters, released 2025-03-11, dense decoder with GQA and QK-Norm, uses 5:1 sliding-window/global attention ratio
  • Mistral Small 3.1 24B: 24B parameters, released 2025-03-18, dense decoder with standard GQA, latency-focused design with smaller KV cache
  • Llama 4 Maverick: 400B total parameters (17B active), released 2025-04-05, sparse MoE decoder with GQA attention, alternates dense and MoE blocks
  • Qwen3 235B-A22B: 235B total parameters (22B active), released 2025-04-28, sparse MoE decoder with GQA and QK-Norm, optimized for serving efficiency without shared expert
  • Qwen3 32B: 32B parameters, released 2025-04-28, dense decoder with GQA and QK-Norm, reference dense Qwen stack with 8 KV heads
  • Qwen3 4B: 4B parameters, released 2025-04-28, dense decoder with GQA and QK-Norm, compact stack with 151k vocabulary
  • Qwen3 8B: 8B parameters, released 2025-04-28, dense decoder with GQA and QK-Norm, reference Qwen3 dense stack with 8 KV heads
  • SmolLM3 3B: 3B parameters, released 2025-06-19, dense decoder with GQA, experiments with periodic NoPE layers
Ad

Practical Features

The gallery includes an issue tracker for reporting inaccurate fact sheets, mislabeled architectures, or broken links. A physical poster version is available via Zazzle with a high-resolution export at 14570 x 12490 pixels (56 MB PNG file, 182 megapixels).

For developers working with AI coding agents, this resource provides concrete architectural details that can inform model selection, fine-tuning decisions, and performance optimization. The side-by-side comparison format makes it easier to understand trade-offs between different architectural choices.

📖 Read the full source: HN LLM Tools

Ad

👀 See Also