Four aarch64-specific failure modes when running vLLM on Blackwell GB10 with CUDA 13.0

Setup and environment
The setup uses GB10 hardware with aarch64 (sbsa-linux), Python 3.12, CUDA 13.0, and vLLM v0.7.1. The issues emerged in a daily-reset test environment and are specific to aarch64 with CUDA 13.0.
Failure mode 1: cu121 wheel doesn't exist for aarch64
Using the --index-url .../cu121 protocol returns: ERROR: Could not find a version that satisfies the requirement torch (from versions: none). The cu121 index has no aarch64 binary. The correct index for Blackwell aarch64 is cu130.
sudo pip3 install --pre torch torchvision torchaudio \ --index-url https://download.pytorch.org/whl/nightly/cu130 \ --break-system-packages
Failure mode 2: ncclWaitSignal undefined symbol
After installing cu130 torch, importing fails with: ImportError: libtorch_cuda.so: undefined symbol: ncclWaitSignal. The apt-installed NCCL doesn't have this symbol, but pip-installed nvidia-nccl-cu13 does. The linker doesn't find it automatically.
Fix: Force it via LD_PRELOAD before every Python call:
export LD_PRELOAD=/usr/local/lib/python3.12/dist-packages/nvidia/nccl/lib/libnccl.so.2
Failure mode 3: numa.h not found during vLLM CPU extension build
The error: fatal error: numa.h: No such file or directory. vLLM's CPU extension requires libnuma-dev, which wasn't installed on the reset system.
sudo apt-get install -y libnuma-dev
Failure mode 4: ABI mismatch — MessageLogger undefined symbol
After completing the full build, launching vLLM fails with: ImportError: vllm/_C.abi3.so: undefined symbol: _ZN3c1013MessageLoggerC1EPKciib.
Diagnosis with nm shows:
- What vLLM binary expected (old signature):
U _ZN3c1013MessageLoggerC1EPKciib← (const char*, int, int, bool) - What the cu130 torch library actually provides (new signature):
T _ZN3c1013MessageLoggerC1ENS_14SourceLocationEib← (SourceLocation, int, bool)
Root cause: pip's build isolation. When running pip install -e ., pip creates an isolated build environment and downloads a separate older torch based on pyproject.toml version constraints. vLLM compiles against those old headers, but at runtime the newer cu130 torch is found, causing signature mismatch.
Fix: Use --no-build-isolation with explicit subprocess injection:
sudo -E env \ LD_PRELOAD="/usr/local/lib/python3.12/dist-packages/nvidia/nccl/lib/libnccl.so.2" \ LD_LIBRARY_PATH="/usr/local/lib/python3.12/dist-packages/torch/lib:..." \ MAX_JOBS=8 \ pip3 install -e . --no-deps --no-build-isolation --break-system-packages
Important detail: sudo -E alone doesn't work because pip's subprocess chain doesn't carry LD_PRELOAD. You need sudo -E env VAR=value pip3 to inject into the subprocess explicitly.
Verify the ABI seal after installation:
nm -D vllm/_C.abi3.so | grep MessageLogger # Must contain "SourceLocation" — if it still says "EPKciib", reinstall
Additional note for multi-agent systems
If using vLLM as a backend for a multi-agent system, add --served-model-name your-model-name. Without it, vLLM serves the model under its full file path and agents get 404 when they query by name.
The full v2 protocol, including automation script and systemd service, is available at github.com/trgysvc/AutonomousNativeForge → docs/BLACKWELL_SETUP_V2.md. The repo is for ANF — a 4-agent autonomous coding pipeline running on top of this setup, but the setup docs stand alone if you just need the Blackwell/vLLM fixes.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Running OpenClaw Locally with Ollama to Avoid API Costs
A Reddit user shares their experience switching from API-based OpenClaw to running it locally with Ollama, eliminating API costs while maintaining workflows. They created a step-by-step installation video guide.

OpenClaw v2026.3.22 Update Issues and 30-Second Fixes
The OpenClaw v2026.3.22 update introduced 12 breaking changes, including ClawHub becoming the default plugin store and deprecated environment variables. Five common disasters with quick fixes include API billing spikes, unintended agent actions, and configuration errors.

Treating OpenClaw Subagents as Stateless Functions Instead of Persistent Team Members
A developer shares their experience shifting from treating OpenClaw subagents as persistent team members with personalities to viewing them as stateless function calls with specialized purposes.

Building a Bridge for Two Telegram Bots in One Group Chat: Delivery Semantics Over HTTP
A developer shares a practical approach to connect two independent Telegram bots in the same group chat, tackling Telegram's bot-to-bot delivery gaps with HTTP relays, ACKs, deduplication, and strict scoped feeds.