1-Bit Bonsai Image 4B: On-Device Image Generation via Binary/Ternary FLUX.2

PrismML has released Bonsai Image 4B, a family of compact image-generation models derived from FLUX.2 Klein 4B using binary and ternary quantization. The diffusion transformer weights are represented as {−1, +1} (1-bit) or {−1, 0, +1} (ternary) with FP16 group-wise scaling factors, yielding 1.125 and 1.71 effective bits per weight respectively.
Key Specifications
- 1-bit Bonsai Image 4B: transformer footprint 0.93 GB (8.3× reduction from 7.75 GB FP16 FLUX.2 Klein 4B). Apple Silicon payload (including compressed text encoder + FP16 VAE) is 3.42 GB.
- Ternary Bonsai Image 4B: transformer footprint 1.21 GB (6.4× reduction). Apple Silicon payload 3.88 GB.
- Mean active memory for 512×512 generation: 1.5 GB (1-bit) / 1.96 GB (ternary) vs 11.74 GB for original FLUX.2 Klein 4B.
- For 1024×1024: 1.95 GB / 2.38 GB vs 14.39 GB.
Performance Benchmarks
The model runs on Apple Silicon (iPhones, iPads, Macs) via MLX low-bit paths, and on CUDA GPUs via Gemlite low-bit GEMM kernels. Generation times:
- iPhone 17 Pro Max: 9.4 seconds for 512×512 image
- Mac M4 Pro: ~6 seconds for 512×512 image (up to 5.6× faster than stock full-precision MFLUX pipeline)
The transformer reduction is achieved via binary/ternary layers (~14× / ~10× compression relative to FP16), while a small set of precision-sensitive projection layers (~5%) remain in FP16. The model is evaluated on GenEval, HPSv3, and DPG-Bench for quality and prompt fidelity.
Who It's For
Developers deploying image generation on-device (laptops, phones, edge devices) who need open weights and practical local inference without cloud dependency.
📖 Read the full source: HN LLM Tools
👀 See Also

Claude Code's /buddy Easter Egg and User Feature Requests
Claude Code includes a hidden /buddy command that creates a Tamagotchi-style companion with species, stats, and decorative comments. A Max subscriber with 840+ sessions has detailed current limitations and proposed functional enhancements.
Transformer Language Model Runs Locally on Stock Game Boy Color
Andrej Karpathy's TinyStories-260K model runs on a stock Game Boy Color via a custom ROM, using INT8 fixed-point math and bank-switched cartridge memory for weights and KV cache.

GitHub Copilot Removes Opus Models from Pro Plan, Pauses New Signups
GitHub is removing Opus models from the Copilot Pro plan and pausing new signups for Pro, Pro+, and Student plans. Opus 4.7 remains available on Pro+, while Pro+ plans now offer more than 5X the usage limits of Pro.

DeepSeek-V4-Flash Makes LLM Steering Practical for Local Models
Seen Goedecke explains why steering vectors are relevant again thanks to DeepSeek-V4-Flash running locally via DwarfStar, with hands-on details on how steering works and why it hasn't caught on before.