Testing MiniMax M2.7 via API on Three Real ML and Coding Workflows

✍️ OpenClawRadar📅 Published: May 21, 2026🔗 Source
Testing MiniMax M2.7 via API on Three Real ML and Coding Workflows
Ad

Andrey Lukyanenko put MiniMax M2.7 through three realistic ML and coding workflows via the API, using Claude Code as the harness. The goal: see how M2.7 performs in agentic loops compared to Claude Opus 4.7.

Setup

The test environment wrapped the MiniMax API into a claude-mm command that points Claude Code at M2.7:

claude-mm () {
  ANTHROPIC_BASE_URL = "https://api.minimax.io/anthropic" \
  ANTHROPIC_AUTH_TOKEN = "$MINIMAX_API_KEY" \
  ANTHROPIC_MODEL = "MiniMax-M2.7" \
  ANTHROPIC_DEFAULT_SONNET_MODEL = "MiniMax-M2.7" \
  ANTHROPIC_DEFAULT_OPUS_MODEL = "MiniMax-M2.7" \
  ANTHROPIC_DEFAULT_HAIKU_MODEL = "MiniMax-M2.7" \
  ANTHROPIC_SMALL_FAST_MODEL = "MiniMax-M2.7" \
  API_TIMEOUT_MS = "3000000" \
  CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC = "1" \
  claude "$@"
}

He ran on MiniMax’s Plus tier ($40/month), where the context window and per-day throughput were sufficient for multi-step agentic work.

Workflow 1: Refactoring a PyTorch Project

The task was to update dependencies and code quality in the pytorch_tempest repo (Hydra + PyTorch Lightning). Changes included:

  • Updated CI versions and pre-commit hooks.
  • Replaced black + flake8 with ruff for linting and formatting.
  • Enabled fsdp_sharding_strategy in the Lightning trainer config.
  • Refreshed documentation.
  • Added uv for environment management.
  • Switched to modern Python typing (list[X] over List[X], X | None over Optional[X]).
  • Removed duplicate code paths.

The approach was step-by-step: Lukyanenko gave explicit requirements, reviewed each change, and provided feedback when the diff went off scope. M2.7 fit this well because it stayed within narrow prompts and allowed line-level review. CI failures were fixed iteratively with the agent’s help.

Ad

Workflow 2: Obsidian Vault Notes

For writing and auditing ML reference notes in Obsidian, Lukyanenko tuned prompts specifically for M2.7. He started by asking both M2.7 and Opus 4.7 to generate notes from the same prompt, then had M2.7 read both outputs and propose an improved prompt for itself. The resulting prompt (condensed) was:

Fill one broken-link stub in the DSWoK vault: research the topic, draft the note in DSWoK voice, run draft-critic-mm, save to the right folder.

Steps: read style guide, pick a stub, grep for cross-references, choose destination folder, draft, then critique.

Key Findings

Across all three runs, M2.7 was useful when constraints were explicit and output format was concrete. It struggled when important context was left implicit, though Opus 4.7 sometimes had the same gaps. For open-ended cases, a human review pass is still recommended. The author notes that model quality and harness design are hard to separate — a stronger model may infer missing constraints, while a better harness makes them explicit.

📖 Read the full source: HN AI Agents

Ad

👀 See Also