Open Source vs Frontier Models: Single-File Canvas Car Scene Benchmark

A developer ran the same single-file Canvas prompt across 12 models to compare open-source and frontier model capabilities on a realistic side-view car driving scene. The task: one standalone HTML file, no libraries, no external assets, with parallax scenery, spinning wheels, subtle body motion, cinematic lighting, and seamless looping. The test harness is OpenCodeOrchestra, and results are live at oco-canvas-car-scene-compare.
Models Tested
Each model ran in an isolated Orchestrator with highest available thinking/effort setting. List includes GPT-5.5 xhigh, GPT-5.4 xhigh, Claude Opus 4.7 (max effort), Claude Opus 4.6 (max effort), Claude Sonnet 4.6 (high effort), Kimi K2.6, DeepSeek V4 Pro, DeepSeek V4 Flash, GLM-5.1, MiniMax M2.7, Qwen 3.6 Plus, and Grok 4.3. Tok/s and generation time were not measured.
Key Findings
- Some models used auditor models internally; some didn't.
- Clear winners and ambiguous results are visible in the gallery.
- MiMo V2.5 Pro was excluded due to billing issues with OpenCode Go subscription.
The gallery page allows side-by-side comparison of each model's output. Source code is on GitHub at AidenGeunGeun/oco-canvas-car-scene-compare.
📖 Read the full source: r/LocalLLaMA
👀 See Also

When RLVR Helps Small Fine-Tuned Models: A 12-Dataset Analysis
A controlled experiment tested adding RLVR reinforcement learning on top of 1.7B parameter models fine-tuned with SFT. Results show text generation tasks improved by +2.0 percentage points on average, while structured tasks declined by -0.7pp.

PostmarketOS February 2026 Update: Generic Kernels and AI Policy
PostmarketOS now offers generic kernel packages (linux-postmarketos-mainline, -stable, -lts) and has updated its AI policy to explicitly forbid generative AI. The project also saw contributor changes and hardware CI improvements.

NHS England retreats from open source: open letter urges reversal of SDLC-8 policy
An open letter with 74 signatures calls on NHS England to withdraw SDLC-8 — a policy that hides all NHS source code — and to reaffirm Principle 12 of the NHS Service Standard: 'Make new source code open.'

Claude Code 2.1.76 adds MCP elicitation, worktree improvements, and fixes for context limits
Claude Code version 2.1.76 introduces MCP elicitation support for structured input during tasks, adds worktree.sparsePaths for large monorepos, and fixes 'Context limit reached' errors on 1M-context sessions. Version 2.1.75 made 1M context windows default for Opus 4.6 on Max, Team, and Enterprise plans.