Open Source vs Frontier Models: Single-File Canvas Car Scene Benchmark

✍️ OpenClawRadar📅 Published: May 17, 2026🔗 Source
Open Source vs Frontier Models: Single-File Canvas Car Scene Benchmark
Ad

A developer ran the same single-file Canvas prompt across 12 models to compare open-source and frontier model capabilities on a realistic side-view car driving scene. The task: one standalone HTML file, no libraries, no external assets, with parallax scenery, spinning wheels, subtle body motion, cinematic lighting, and seamless looping. The test harness is OpenCodeOrchestra, and results are live at oco-canvas-car-scene-compare.

Models Tested

Each model ran in an isolated Orchestrator with highest available thinking/effort setting. List includes GPT-5.5 xhigh, GPT-5.4 xhigh, Claude Opus 4.7 (max effort), Claude Opus 4.6 (max effort), Claude Sonnet 4.6 (high effort), Kimi K2.6, DeepSeek V4 Pro, DeepSeek V4 Flash, GLM-5.1, MiniMax M2.7, Qwen 3.6 Plus, and Grok 4.3. Tok/s and generation time were not measured.

Ad

Key Findings

  • Some models used auditor models internally; some didn't.
  • Clear winners and ambiguous results are visible in the gallery.
  • MiMo V2.5 Pro was excluded due to billing issues with OpenCode Go subscription.

The gallery page allows side-by-side comparison of each model's output. Source code is on GitHub at AidenGeunGeun/oco-canvas-car-scene-compare.

📖 Read the full source: r/LocalLLaMA

Ad

👀 See Also