Trellis 2 Successfully Running on ROCm 7.11 with AMD RX 9070 XT

Getting Trellis 2 Running on AMD Hardware
A developer has successfully run Trellis 2 on an AMD RX 9070 XT GPU using ROCm 7.11 on Linux Mint 22.3. This addresses common issues where users encountered geometry cutoff, preview failures, and other errors when attempting to run Trellis 2 on AMD hardware.
Key Issues and Solutions
The developer identified two main problems that were causing most failures:
1. ROCm Instability with High N Tensors
ROCm operations become unstable with large tensors, causing overflows or NaN values. The original code in linear.py in the sparse folder used:
def forward(self, input: VarLenTensor) -> VarLenTensor:
return input.replace(super().forward(input.feats))The fix implements chunked processing to avoid ROCm issues:
ROCM_SAFE_CHUNK = 524_288
def rocm_safe_linear(feats: torch.Tensor, weight: torch.Tensor, bias=None) -> torch.Tensor:
"""F.linear with ROCm large-N chunking workaround."""
N = feats.shape[0]
if N <= ROCM_SAFE_CHUNK:
return F.linear(feats, weight, bias)
out = torch.empty(N, weight.shape[0], device=feats.device, dtype=feats.dtype)
for s in range(0, N, ROCM_SAFE_CHUNK):
e = min(s + ROCM_SAFE_CHUNK, N)
out[s:e] = F.linear(feats[s:e], weight, bias)
return out
def forward(self, input):
feats = input.feats if hasattr(input, 'feats') else input
out = rocm_safe_linear(feats, self.weight, self.bias)
if hasattr(input, 'replace'):
return input.replace(out)
return out
2. Broken hipMemcpy2D in CuMesh
The hipMemcpy2D function in CuMesh was causing vertices and faces to drop off or become corrupted. The original CuMesh initialization used:
void CuMesh::init(const torch::Tensor& vertices, const torch::Tensor& faces) {
size_t num_vertices = vertices.size(0);
size_t num_faces = faces.size(0);
this->vertices.resize(num_vertices);
this->faces.resize(num_faces);
CUDA_CHECK(cudaMemcpy2D(
this->vertices.ptr,
sizeof(float3),
vertices.data_ptr(),
sizeof(float) * 3,
sizeof(float) * 3,
num_vertices,
cudaMemcpyDeviceToDevice
));
...
} The fix replaces the 2D copy with a 1D version:
CUDA_CHECK(cudaMemcpy(
this->vertices.ptr,
vertices.data_ptr(),
num_vertices * sizeof(float3),
cudaMemcpyDeviceToDevice
)); Results and Performance
With these fixes, the developer successfully got the image-to-3D pipeline working, including preview rendering (without normals) and final GLB export. On a test image with 21,204 tokens, the process took approximately 280 seconds from start to preview generation. The run used 1024 resolution with all samplers set to 20 steps.
📖 Read the full source: r/LocalLLaMA
👀 See Also

5 Core OpenClaw Capabilities Available Without Installing Skills
OpenClaw's base installation can handle file operations, shell commands, web fetching, scheduled tasks, and multi-step workflows without additional skills, reducing token costs and setup complexity.

OpenClaw v2.0 update requires manual checks before installation
OpenClaw's latest update includes 12 breaking changes, a new plugin system, and 30+ security patches. The update will silently break setups if users run npm update without first checking environment variables, state directories, and browser automation configurations.

Post-Mortem: Claude Max + OpenClaw Billing Errors from Stale OAuth and Isolated Cron Jobs
OpenClaw agent breaks randomly due to stale OAuth token blacklisting the entire Anthropic provider and isolated cron jobs hitting the Extra Usage bucket. Full fix: remove manual profile, move cron to main session, clear billing lockout.

OpenClaw setup for human-in-the-loop browser automation with Docker, Chromium, and noVNC
A developer shares their Docker container setup that enables OpenClaw to handle CAPTCHAs and approvals mid-run by using Chromium with noVNC for remote access, requiring ~300MB RAM and 3-second cold starts.