Running a 1 Trillion Parameter LLM Locally on AMD Ryzen AI Max+ Cluster

Running a 1 Trillion Parameter LLM Locally on AMD Ryzen AI Max+ Cluster
AMD's technical article details how to build a small-scale distributed inference cluster using four Framework Desktop systems with Ryzen AI Max+ 395 processors and run the Kimi K2.5 open-source model (1 trillion parameters, 375GB) using llama.cpp RPC. The setup treats the four machines as a single logical AI accelerator.
Hardware and Software Stack
- Hardware: 4x Framework Desktop - AMD Ryzen AI Max+ 395 - 128GB
- AI Framework: AMD ROCm
- Inference Engine: Llama.cpp RPC
- OS: Ubuntu 24.04.3 LTS
- Model: Kimi-K2.5 (UD_Q2_K_XL) (375GB)
- Network: 5Gbps over Ethernet
Technical Setup: Extended VRAM Allocation
For each Ryzen AI Max+ system, BIOS must first set iGPU Memory Size to 512MB. The maximum dedicated VRAM per node via BIOS is 96GB (384GB total across four nodes). Using Translation Table Manager (TTM) kernel parameters increases this to 120GB per node (480GB total).
Configure kernel parameters:
sudo nano /etc/default/grub
Find line starting with GRUB_CMDLINE_LINUX_DEFAULT= and append inside quotes:
"quiet splash ttm.pages_limit=30720000 amdgpu.gttsize=120000"
TTM limits are expressed in 4 KB pages. Calculation for 120GB: (120 * 1024 * 1024) / 4.096 = 30720000
After saving and exiting, run:
sudo update-grub sudo reboot
Verify configuration:
$ sudo dmesg | grep "amdgpu.*memory" [drm] amdgpu: 512M of VRAM memory ready [drm] amdgpu: 120000M of GTT memory ready.
Setup Option 1: Lemonade SDK (Recommended)
Download pre-built binaries from: https://github.com/lemonade-sdk/llamacpp-rocm/releases/latest/
Download archive matching your platform and GPU target: llama-bxxxx-ubuntu-rocm-gfx1151-x64.zip
Extract and prepare:
unzip llama-bxxxx-ubuntu-rocm-gfx1151-x64.zip cd llama-bxxxx-ubuntu-rocm-gfx1151-x64 chmod +x llama-cli llama-server rpc-server
Verify GPU detection:
$ ./llama-cli --list-devices ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 Available devices: ggml_backend_cuda_get_available_uma_memory: final available_memory_kb: 127697544 ROCm0: AMD Radeon Graphics (120000 MiB, 124704 MiB free)
Setup Option 2: Manual Source Build
Install ROCm 7.0.2 on Ubuntu 24.04.3:
wget https://repo.radeon.com/amdgpu-install/7.0.2/ubuntu/noble/amdgpu-install_7.0.2.70002-1_all.deb sudo apt install ./amdgpu-install_7.0.2.70002-1_all.deb sudo apt update sudo apt install python3-setuptools python3-wheel sudo usermod -a -G render,
The article continues with additional setup steps and inference configuration details.
📖 Read the full source: HN LLM Tools
👀 See Also

5 Core OpenClaw Capabilities Available Without Installing Skills
OpenClaw's base installation can handle file operations, shell commands, web fetching, scheduled tasks, and multi-step workflows without additional skills, reducing token costs and setup complexity.

Practical setup and configuration guide for OpenClaw self-hosted AI agent
OpenClaw is a self-hosted AI agent that integrates with messaging apps and maintains persistent memory through a file-based system. Key setup recommendations include starting with the terminal interface, connecting only one messaging channel initially, and properly configuring the SOUL.md file for personality and security rules.

OpenClaw 101: The Ultimate Setup Guide for New Users

Modifying OpenClaw's default system prompt to bypass content restrictions
A user modified OpenClaw's configuration file to change the default system prompt from "You are a helpful, respectful and honest assistant" to a custom prompt that ignores external safety filters, effectively removing content restrictions. The process involves editing config.js in the node-llama-cpp installation directory.