Testing MiniMax M2.7 via API on Three Real ML and Coding Workflows

Andrey Lukyanenko put MiniMax M2.7 through three realistic ML and coding workflows via the API, using Claude Code as the harness. The goal: see how M2.7 performs in agentic loops compared to Claude Opus 4.7.
Setup
The test environment wrapped the MiniMax API into a claude-mm command that points Claude Code at M2.7:
claude-mm () {
ANTHROPIC_BASE_URL = "https://api.minimax.io/anthropic" \
ANTHROPIC_AUTH_TOKEN = "$MINIMAX_API_KEY" \
ANTHROPIC_MODEL = "MiniMax-M2.7" \
ANTHROPIC_DEFAULT_SONNET_MODEL = "MiniMax-M2.7" \
ANTHROPIC_DEFAULT_OPUS_MODEL = "MiniMax-M2.7" \
ANTHROPIC_DEFAULT_HAIKU_MODEL = "MiniMax-M2.7" \
ANTHROPIC_SMALL_FAST_MODEL = "MiniMax-M2.7" \
API_TIMEOUT_MS = "3000000" \
CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC = "1" \
claude "$@"
}He ran on MiniMax’s Plus tier ($40/month), where the context window and per-day throughput were sufficient for multi-step agentic work.
Workflow 1: Refactoring a PyTorch Project
The task was to update dependencies and code quality in the pytorch_tempest repo (Hydra + PyTorch Lightning). Changes included:
- Updated CI versions and pre-commit hooks.
- Replaced black + flake8 with ruff for linting and formatting.
- Enabled
fsdp_sharding_strategyin the Lightning trainer config. - Refreshed documentation.
- Added uv for environment management.
- Switched to modern Python typing (
list[X]overList[X],X | NoneoverOptional[X]). - Removed duplicate code paths.
The approach was step-by-step: Lukyanenko gave explicit requirements, reviewed each change, and provided feedback when the diff went off scope. M2.7 fit this well because it stayed within narrow prompts and allowed line-level review. CI failures were fixed iteratively with the agent’s help.
Workflow 2: Obsidian Vault Notes
For writing and auditing ML reference notes in Obsidian, Lukyanenko tuned prompts specifically for M2.7. He started by asking both M2.7 and Opus 4.7 to generate notes from the same prompt, then had M2.7 read both outputs and propose an improved prompt for itself. The resulting prompt (condensed) was:
Fill one broken-link stub in the DSWoK vault: research the topic, draft the note in DSWoK voice, run draft-critic-mm, save to the right folder.
Steps: read style guide, pick a stub, grep for cross-references, choose destination folder, draft, then critique.
Key Findings
Across all three runs, M2.7 was useful when constraints were explicit and output format was concrete. It struggled when important context was left implicit, though Opus 4.7 sometimes had the same gaps. For open-ended cases, a human review pass is still recommended. The author notes that model quality and harness design are hard to separate — a stronger model may infer missing constraints, while a better harness makes them explicit.
📖 Read the full source: HN AI Agents
👀 See Also

KubeShark: A Kubernetes Skill for Claude Code and Codex to Catch Hallucinated YAML
KubeShark is a failure-mode-first Kubernetes skill for Claude Code and Codex that catches deprecated APIs, misconfigured probes, broken selectors, and other common AI-generated mistakes before they hit production.

PreToolUse Hook Fixes Claude Code Image Crash Problem
A developer created a PreToolUse hook that intercepts Claude Code's Read calls on images, converts them safely, and proxies them through a Haiku subprocess to prevent API Error 400 crashes from problematic images.

Open source PR review agent PrixAI detects all 10/10 planted bugs at 6x lower cost than CodeRabbit
A Reddit user built PrixAI, an open source PR review agent that uses local/cheap inference models to match CodeRabbit's features at 6x less cost, detecting all 10 intentionally planted issues in a test PR.

Cloudflare's vinext: A Next.js-compatible framework built with AI on Vite
Cloudflare engineers rebuilt Next.js API surface on Vite using AI in one week, creating vinext - a drop-in replacement that builds 4x faster and produces 57% smaller bundles. It deploys to Cloudflare Workers with a single command.