SkillOpt: Optimizing Markdown Skill Files as Trainable Parameters for AI Agents

SkillOpt is a new optimization framework that treats markdown skill files as trainable parameters, applying proper optimization machinery to the ad-hoc skill editing many agent builders already do. The paper (arxiv.org/pdf/2605.23904) formalizes a process: a frontier model proposes bounded edits (add/delete/replace) to markdown skill files, and each edit is gated against a held-out validation set. Only strict improvements are accepted; ties are rejected, and rejected edits become negative signal for subsequent rounds.
Key Findings
- Convergence: Best skills converge with 1 to 4 accepted edits out of many more proposals. An edit budget of 4 to 8 per step works best; removing the cap causes performance to collapse.
- Skill size: The median final skill is ~920 tokens.
- Model transfer: A skill optimized on Codex transferred to Claude Code with zero modification and gained +59.7 on SpreadsheetBench. GPT 4.1 Nano with an optimized skill roughly matched frontier models on procedural benchmarks.
Limitations
The validation gate requires an auto-grader with clear correct answers. This works for code and spreadsheets but breaks for anything open-ended.
Who It's For
Developers building AI coding agents who want to systematically optimize skill files rather than relying on manual iteration or ad-hoc prompt engineering.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Automated Claude Code Pipeline Cuts Token Usage from 78k to 15k Per Feature
An open-source pipeline for Claude Code automates 12 phases including pre-check analysis of existing code, reducing token usage from ~78k to ~15k per feature. It offers three profiles (yolo, standard, paranoid) and replaces confidence scores with grep-based validation.

Baton: A Desktop App for Managing Multiple AI Coding Agents
Baton is a desktop application that helps developers manage multiple AI coding agents across isolated workspaces. It provides real terminal sessions, git worktree isolation, and status monitoring for agents like Claude Code, Codex CLI, OpenCode, and Gemini CLI.

ClamBot: AI Agent Runs LLM-Generated Code in WASM Sandbox for Security
ClamBot is an AI agent framework that executes all LLM-generated code in a WebAssembly sandbox using QuickJS in Wasmtime, eliminating the need for exec() or subprocess calls. It includes an approval gate for tool calls, persistent script caching as 'clams', and supports multiple LLM providers.

Native macOS MCP Server for Full OS Control
A native macOS server provides 24 tools for pixel-accurate clicks, key combos, drag-and-drop, app management, multi-display support, and clipboard access. It's open source and works with Claude Code, Cursor, or any MCP client.