SkillOpt: Markdown Skills as Trainable Parameters

SkillOpt is a new optimization framework that treats markdown skill files as trainable parameters, applying proper optimization machinery to the ad-hoc skill editing many agent builders already do. The paper (arxiv.org/pdf/2605.23904) formalizes a process: a frontier model proposes bounded edits (add/delete/replace) to markdown skill files, and each edit is gated against a held-out validation set. Only strict improvements are accepted; ties are rejected, and rejected edits become negative signal for subsequent rounds.

Key Findings

Convergence: Best skills converge with 1 to 4 accepted edits out of many more proposals. An edit budget of 4 to 8 per step works best; removing the cap causes performance to collapse.
Skill size: The median final skill is ~920 tokens.
Model transfer: A skill optimized on Codex transferred to Claude Code with zero modification and gained +59.7 on SpreadsheetBench. GPT 4.1 Nano with an optimized skill roughly matched frontier models on procedural benchmarks.

Limitations

The validation gate requires an auto-grader with clear correct answers. This works for code and spreadsheets but breaks for anything open-ended.

Who It's For

Developers building AI coding agents who want to systematically optimize skill files rather than relying on manual iteration or ad-hoc prompt engineering.

📖 Read the full source: r/LocalLLaMA

SkillOpt: Optimizing Markdown Skill Files as Trainable Parameters for AI Agents

Key Findings

Limitations

Who It's For

👀 See Also

DeepClaude swaps Claude Code's Anthropic backend for DeepSeek V4 Pro at 17x lower cost

W2A — an open protocol for agent sensors: giving local agents real-time perception

LLM-Memory.net: Open-Source Memory System with Multi-Agent Infrastructure

Claude Code vs Codex: 36 vs 28 files, $2.50 vs $2.04, infinite loop caught — real-world comparison