Steelman R5: Fine-tuned 14B Model Outperforms Claude Opus on Ada Code Generation

✍️ OpenClawRadar📅 Published: March 13, 2026🔗 Source
Steelman R5: Fine-tuned 14B Model Outperforms Claude Opus on Ada Code Generation
Ad

Model and Training Details

The Steelman R5 model is a fine-tuned version of Qwen2.5-Coder-14B-Instruct specifically optimized for Ada code generation. Training used QLoRA 4-bit via Unsloth with TRL SFTTrainer on a dataset of 3,430 Ada/SPARK instruction pairs where every training example passes gnatmake -gnat2022 -gnatwa compilation.

Training configuration: LoRA rank 32, alpha 64, targeting q/k/v/o/gate/up/down projections. The model was fully retrained from base each round on accumulated dataset (adapter continuation caused catastrophic forgetting at R2). Training ran for 1 epoch with learning rate 2e-5, constant schedule, taking about 49 minutes per round on a rented H100. Five rounds total (R1–R5), with R2 discarded.

Benchmark Results

Custom Ada Compilation Benchmark (1,000 prompts, first-attempt clean compile):

  • Steelman R5 (14B): 68.6% compile rate
  • Claude Opus 4.6: 42.1% compile rate
  • Claude Sonnet 4.6: 37.2% compile rate
  • Qwen2.5-Coder-14B (base, untuned): ~35% compile rate
  • Claude Sonnet 4: 27.5% compile rate

MultiPL-E HumanEval-Ada (157 problems, pass@1):

  • Steelman R5: 47.1% pass@1, 74.5% compile rate
  • Qwen2.5-Coder-14B (base): 34.4% pass@1, 51.0% compile rate

These are the first published Ada pass@1 results on HumanEval for any open model.

Ad

Usage and Availability

Run the model with: ollama run hf.co/the-clanker-lover/steelman-14b-ada-v0.1-GGUF

The GGUF version fits in 12GB VRAM with Q4_K_M quantization.

Limitations

  • Compilation ≠ correctness: 68.6% compiles, but only 47.1% produces correct output on HumanEval
  • Error-fix capability is weak (5.1%) - don't expect it to debug Ada code
  • SPARK contracts compile but aren't verified with gnatprove
  • Synthetically generated training data - no human Ada developers wrote these examples
  • 14B model size means it may miss things a larger model would catch

Resources

  • Model: https://huggingface.co/the-clanker-lover/steelman-14b-ada-v0.1
  • GGUF: https://huggingface.co/the-clanker-lover/steelman-14b-ada-v0.1-GGUF
  • Dataset: https://huggingface.co/datasets/the-clanker-lover/steelman-sft-ada

📖 Read the full source: r/LocalLLaMA

Ad

👀 See Also

OpenMind adds visual mind map interface to OpenClaw installations
Tools

OpenMind adds visual mind map interface to OpenClaw installations

OpenMind is an open-source tool that transforms OpenClaw installations into interactive, live-editable mind maps with real-time memory visualization, hot-swappable logic, and full-text search across all nodes.

OpenClawRadar
RalphTerm: ralph-style loop for Claude Code with cross-review sessions from different agents
Tools

RalphTerm: ralph-style loop for Claude Code with cross-review sessions from different agents

RalphTerm is an open-source Rust CLI that runs a ralph-style outer loop around Claude Code: it takes a markdown plan, executes tasks in fresh interactive sessions, and runs cross-review with a different model (e.g., Codex) in separate fresh sessions, feeding issues back into new implementer sessions.

OpenClawRadar
Argus: A VS Code Extension to Debug Claude Code Session Costs and Behavior
Tools

Argus: A VS Code Extension to Debug Claude Code Session Costs and Behavior

A developer built Argus, a VS Code extension that parses Claude Code JSONL transcripts into a real-time timeline with per-step token/cost breakdown, cache hit ratio, and flagging of retry loops, duplicate reads, and context pressure.

OpenClawRadar
Caliber: Local CLI tool generates AI coding assistant configs from your repo
Tools

Caliber: Local CLI tool generates AI coding assistant configs from your repo

Caliber is a local-first CLI tool that scans repositories in languages like TypeScript, Python, Go, and Rust, then generates prompt and configuration files for AI coding assistants including Claude Code, Cursor, and Codex. It runs entirely on your machine with your own keys, has 13k npm installs, and is open source under MIT license.

OpenClawRadar