Antigravity 2.0 Tops OpenSCAD Architectural 3D Benchmark – ModelRift Tests 6 LLMs on the Pantheon

ModelRift ran a practical benchmark: they asked six AI coding tools to build the Pantheon in OpenSCAD from reference images. The goal was to test how well each system turns architectural reference material into parametric CAD code. The prompt used two images (front facade and aerial view) and required using the OpenSCAD CLI to preview and iterate.
Why Pantheon + OpenSCAD?
Basic prompts like "cube with a hole" test only simple syntax (difference, cube, cylinder). The Pantheon sits in a middle ground: it has radial symmetry (rotunda, dome, oculus), straight portico faces, columns, stepped bases, and a triangular pediment. This mix tests an LLM's ability to handle nested transformations, Boolean operations, loops, and named modules — all native to OpenSCAD's plain-text representation. OpenSCAD keeps geometry as the artifact, avoiding the indirection of Blender MCPs or UI actions.
Benchmark Results
Six systems were tested. Each output was scored on quality (1-5) and given a summary. The table below shows the top results:
| Tool & Model | Time | Quality | Summary |
|---|---|---|---|
| Antigravity 2.0 | ●●●○○ (3/5) | ●●●●○ (4.5/5) | Best quality. Captured Pantheon proportions, dome with oculus, portico, columns, pediment, and facade details. Architecture most faithful to references. |
| Codex 5.5 High | ●●●●○ (4/5) | ●●●○○ (3.0/5) | Strong detail density, including inscription on entablature. But final STL didn't match PNG preview, holding score down. |
| Cursor 3.5 / Composer 2.5 | ●●●●● (5/5) | ●○○○○ (1.4/5) | Fastest run but weakest output: poor proportions, color discipline, and architectural details. |
Full results include three more entries (not detailed here). The benchmark code and render comparisons are available on the original post.
Practical Takeaways
- Antigravity 2.0 produced the most architecturally accurate OpenSCAD code, with correct dome rings, column spacing, and facade relationships.
- Codex 5.5 added fine details (inscription) but suffered an export mismatch — the preview looked better than the final STL.
- Cursor 3.5 was quick but the geometry was crude; it's fine for rapid prototyping but not for production CAD.
- The benchmark confirms OpenSCAD is a strong target for LLM-generated geometry: plain text, compact vocabulary, and easy iteration via CLI.
If you're using AI coding agents for parametric 3D modeling, especially for architectural or mechanical parts with radial symmetry and Boolean operations, this benchmark gives a clear signal: Antigravity 2.0 currently leads in quality. For speed-first tasks, Cursor 3.5 might still be useful if you're willing to iterate heavily.
📖 Read the full source: HN LLM Tools
👀 See Also

Brain-MCP Developer Documents Tools for Claude AI Instead of Humans
A developer maintaining the Brain-MCP server added a 'For AI Assistants' section to documentation with behavioral instructions, resulting in Claude using tools more intelligently and proactively injecting context when topics change.

BaseLayer: Open-Source Behavioral Compression Pipeline for AI Memory Systems
BaseLayer is an open-source pipeline that extracts beliefs, behaviors, tensions, and contradictions from conversations, journals, and published text, compressing them into an identity brief for AI models. It has been tested on datasets ranging from 8 personal journal entries to large corpora like Warren Buffett's shareholder letters (350k words) and Howard Marks' investment memos (600k words).

the-knowledge-guy: Turn Your Bookshelf Into a Tutor With Claude Code Skills
A Claude Code skill set that ingests your PDF/EPUB books locally and lets you ask questions, get taught topic-by-topic, or pull cheatsheets — all with citations across your library.

Brainstorm MCP Server Lets Claude Code Consult Other LLMs for Better Answers
A developer built an MCP server that enables Claude Code to consult with other AI models like GPT-5.2 and DeepSeek before providing answers. The models engage in multi-round debates where they read each other's responses, disagree, and refine positions to converge on better solutions.