TRELLIS.2 Image-to-3D Ported to Run Natively on Apple Silicon

What This Is
A port of Microsoft's TRELLIS.2 image-to-3D model that runs natively on Apple Silicon via PyTorch MPS, replacing CUDA-only dependencies with pure-PyTorch alternatives.
Key Details
The original TRELLIS.2 requires CUDA with flash_attn, nvdiffrast, and custom sparse convolution kernels that don't work on Mac. This port replaces those with:
- A gather-scatter sparse 3D convolution implementation (backends/conv_none.py)
- SDPA attention for sparse transformers using PyTorch's scaled_dot_product_attention
- Python-based mesh extraction replacing CUDA hashmap operations (backends/mesh_extract.py)
Total changes are a few hundred lines across 9 files. All hardcoded .cuda() calls were patched to use the active device instead.
Performance & Requirements
On M4 Pro (24GB), generates ~400K vertex meshes from single photos in about 3.5 minutes. Memory usage peaks at around 18GB unified memory during generation.
Requirements:
- macOS on Apple Silicon (M1 or later)
- Python 3.11+
- 24GB+ unified memory recommended
- ~15GB disk space for model weights
Setup & Usage
Quick start:
git clone https://github.com/shivampkumar/trellis-mac.git
cd trellis-mac
hf auth login
bash setup.sh
source .venv/bin/activate
python generate.py path/to/image.pngYou need to request access to gated models on HuggingFace: facebook/dinov3-vitl16-pretrain-lvd1689m and briaai/RMBG-2.0.
Basic usage:
python generate.py photo.png
python generate.py photo.png --seed 123 --output my_model --pipeline-type 512Limitations
- No texture export (meshes export with vertex colors only)
- Hole filling disabled (meshes may have small holes)
- Slower than CUDA (~10x slower for sparse convolution)
- Inference only, no training support
Technical Implementation
The sparse 3D convolution builds a spatial hash of active voxels, gathers neighbor features for each kernel position, applies weights via matrix multiplication, and scatter-adds results back. Mesh extraction reimplements flexible_dual_grid_to_mesh using Python dictionaries instead of CUDA hashmap operations.
Benchmarks on M4 Pro (24GB), pipeline type 512:
- Model loading: ~45s
- Image preprocessing: ~5s
- Sparse structure sampling: ~15s
- Shape SLat sampling: ~90s
- Texture SLat sampling: ~50s
- Mesh decoding: ~30s
- Total: ~3.5 min
📖 Read the full source: HN LLM Tools
👀 See Also

Brain-MCP Developer Documents Tools for Claude AI Instead of Humans
A developer maintaining the Brain-MCP server added a 'For AI Assistants' section to documentation with behavioral instructions, resulting in Claude using tools more intelligently and proactively injecting context when topics change.

Clawdex: A Directory for Tracking OpenClaw Derivatives and Forks
Clawdex is a directory listing 18 OpenClaw-related projects across three tiers, with data on stars, language, and category tags. The project is PR-driven, requiring contributors to fork the repo, add a YAML file to /src/data/projects/, and open a pull request.

Natural Language Autoencoders: Turning Claude's Internal Representations into Text
Transformer Circuits Thread publishes Natural Language Autoencoders that decode Claude's internal activations into readable text. GitHub repo and interactive demo available.

Local Voice Control Setup for AI Agents on Apple Silicon
Setup local voice control for AI agents using Parakeet STT and Kokoro TTS on Apple Silicon for fast and cloud-independent interactions.