Kimi K2.6 vs Claude Opus 4.7: Hands-On Test with a Minetest Bounty Board Mod

Interesting real-world comparison of two models on a weird coding task: building a Minetest/Luanti bounty board game mod with a TypeScript backend, then extending it with Google Sheets logging through Composio. Both models received the same prompts. Details from the source post.
Setup
- Claude Opus 4.7: via Claude Code
- Kimi K2.6: via OpenCode on OpenRouter
- Task: player joins world, runs
/bounty, gets task, completes it, gets reward, backend records completion. Second test: log completions to Google Sheets via Composio.
Pricing
- Opus 4.7: $5/M input, $25/M output
- Kimi K2.6: $0.95/M input, $4/M output (cached input $0.16/M)
Test 1: Local Bounty Board
Opus 4.7: Clean MVP. Express/Zod/Vitest backend, Lua mod, /bounty flow, rewards, leaderboard, tests passed. Stats:
- Cost: ~$3.59
- Time: 12min API, 23min wall
- Code: +1,688 / -0
- Output tokens: 54.8k
- Cache read: 2.8M
Kimi K2.6: Got the local board working too, but messier. Wrote 4,671 lines of code (+4,671 / -0) vs Opus's 1,688 — over 2× more code. Cost: ~$0.39. Time: ~9min 27sec. The annoying part: Minetest config. It wrote secure.http_mods = bountykimi in the global config, but created a world-level config with a different mod name, so the HTTP API was not enabled for the running mod. Took the tester 30+ minutes to debug.
Test 2: Composio + Google Sheets
Opus 4.7: Got Google Sheets sync working. After some back-and-forth on tsx watch and env loading, backend could complete a bounty and append to Sheets. Stats:
- Cost: $16.03
- Time: 28min API, 1hr 17min wall
- Code: +1,848 / -507
- Cache read: 22.3M
- Output: 123.3k tokens
Kimi K2.6: Failed. Stuck on dev server issues, tests, build problems. Never wired the Composio integration into a working state. After ~25 min and 135k+ tokens, tester stopped. Cost: ~$5.03.
Takeaway
- Best local MVP: Opus, but Kimi is way better value
- Best real integration: Opus by a lot
- Cleaner code: Opus
- Cheaper experiment model: Kimi
Testing shows Kimi K2.6 is interesting for cheaper local coding tasks — for $0.39 getting a working Lua+TypeScript mod is not bad. But once the task involved external tools, config issues, and real integration work, Opus 4.7 was clearly ahead.
Full breakdown with commits, screenshots, demos, and costs at the source link.
📖 Read the full source: r/ClaudeAI
👀 See Also

GitHub Claude-Code v2.1.27 Release: Key Updates and Fixes
Claude-Code v2.1.27 enhances logging and fixes several issues, including context management and OAuth token expiration in VSCode.

Claude Sonnet 4.6 Beats Opus 4.6 on Execution in Prompt Benchmark
A Reddit user submitted a complex prompt to both Sonnet 4.6 and Opus 4.6; the Sonnet model produced a superior response judged by creativity and hidden requirements.

Microsoft exec suggests AI agents may require software licenses as 'seat opportunities'
Microsoft executive Rajesh Jha suggests AI agents could need their own software licenses, with each agent counting as a 'seat' in enterprise systems. This contrasts with views that AI will reduce license counts by replacing human users.

DeepSeek-V4 Pro and Flash: 1.6T Parameters, 1M Token Context, Hybrid Attention
DeepSeek-V4-Pro (1.6T params, 49B active) and V4-Flash (284B params, 13B active) support 1M token context. New hybrid attention (CSA + HCA) reduces single-token inference FLOPs to 27% and KV cache to 10% of DeepSeek-V3.2.